Head of Infra / Senior Engineering Manager @ drive.ai
Staff Software Engineer @ drive.ai
I write solid, extensible and responsive program for large data processing. Over multiple years of first hand experience in architecting, coding, optimizing and troubleshooting real time distributed system for data pipeline processing as well as data mining by using various big data and messaging technologies. I also develop web tools and utility frameworks for debug, operation and
I write solid, extensible and responsive program for large data processing. Over multiple years of first hand experience in architecting, coding, optimizing and troubleshooting real time distributed system for data pipeline processing as well as data mining by using various big data and messaging technologies. I also develop web tools and utility frameworks for debug, operation and monitor purposes.
Big Data: HBase, Hadoop, Hive
Distributed Processing/Messaging: MapReduce, Spark, Storm, Kafka, RabbitMQ, JMS
SQL Database: Oracle, Mysql, Redshift
Web: JQuery, JSP, Ajax, Tomcat, Jetty, Spring MVC, Jersey, Struts
Misc: Spring, SpringBatch, Hibernate, Zookeeper
Staff Software Engineer @ I have dramatically improved both scalability and speed of Quora's data pipeline over the past year. Besides I'm also glad that I have contributed my prior knowledge to successfully launched a few important services into production which is foundation for the work done by other teams. (e.g. HBase, Storm, Spark, etc) From July 2015 to Present (4 months) Quora Engineer @ I work in the data infrastructure team. Learning python and exploring the none java world ... From June 2014 to Present (1 year 5 months) Principal Software Engineer @ Team lead and tech lead in migrating the entire company's catalog and backend data processing from using Oracle to HBase and from high latency batch processing to low latency real time processing.
(1) Schema redesign, optimization and feature enhancement in HBase:
* Introduced a column name encoding scheme to store inner entity (relational) data in the same row.
* Building real time consistent secondary index in HBase by using one house made high performed transaction library which reliably tracks & stores unfinished transactions in a HDFS file whose design aim is to provide recoverable mechanism during transaction failures.
* Coded an annotation driven Object-HBase mapping library to speed up development with HBase. The library is rich in features which supports composite key/column, custom type conversion, secondary index, inner entity, inner entity secondary index, etc.
* Setting up read only replicated HBase cluster for non-critical tasks and HBase MapReduce task.
* Tweaking cluster read performance by reducing compaction frequencies due to global memory contention which is caused by improper flush threshold and too many regions.
* Optimizing read speed by minimizing unnecessary writes which results in scanning less KVs.
(2) Moving batch oriented process to real time processing by publishing changes to downstreams through Kafka and RabbitMQ. Using Storm for distributed message processing which is able to reduce the end-to-end latency (from backend to index and cache) from hours to minutes.
(3) Developed a utility library to track client side performance by using Graphite to store aggregated stats from StatsD at different cluster level, machine level and process level.
(4) Syncing data from Oracle to HBase using Oracle advanced queue during incremental data migration.
(5) Setup Hive cluster to dump HBase table data for data analytics.
(6) Mentoring team members with the new technologies and provide feedback during code review. From September 2013 to June 2014 (10 months) Senior Software Engineer @ I work on various projects including internationalization (i18n), experiment (A/B testing) framework, deals & stream, etc. From April 2011 to September 2013 (2 years 6 months) Software Engineer @ I mainly work on creating/improving clustering algorithm for product shopping data and evaluate algorithm performance and revenue impact based on A/B test result from production. From January 2008 to April 2011 (3 years 4 months) Research Assistant @ My work is focused on building and evaluating different encryption algorithms in a database-as-service-model. From September 2006 to December 2007 (1 year 4 months)
MS, Computer Science @ University of Wisconsin-Milwaukee From 2006 to 2008 BS, Computer Science @ Shanghai University From 2002 to 2006
Looking for a different
Get an email address for anyone on LinkedIn with the ContactOut Chrome extension