Sunnyvale, California, United States
Embark on bringing AI to media creation
• Served as a tech-lead in a team and co-owned responsibilities of building long-term roadmap on VoD quality-of-experience domain. • Led and designed the dynamic manifest packager, which serves a foundation of advanced video features to reduce overall round-trip time for video playback and geo-regional optimization, ultimately reducing time-to-first frame and stall ratio • Spearhead the investigation on video quality metrics, drive an alignment on video quality observability roadmap and implement the initial foundation of accurate video quality measurement. • Led and designed an inhouse end-to-end test framework reduce the total number of high impact incidents (Sev 2 or above) • Served as a PoC from media infra to drive alignments on multiple VoD projects across different orgs.
My main responsibility is to design and develop internal streaming data pipelines by using Oracle GoldenGate, Apache Kafka and Apache Storm. This project has been serving hundreds of use cases that require low latency data, such as online machine learning, reporting and etc. • Design and develop highly available streaming data pipelines with 1 min latency from Site databases to Kafka topics. • Lead the design and development of metrics and alerting for streaming data pipelines • Lead the design and development of inline validation that prove 0% data loss in the pipelines • Develop parallel data hydration transformation in Storm to enrich data streams that scales up to tens of thousands of messages per second • Design and develop schema evolution using Apache Avro to handle inline schema changes • Design and develop the checkpoint mechanism to ensure data consistency in all cases
I worked on multiple data pipeline projects inside PayPal, which support data integration across different analytic systems and is used by hundreds of data engineers and analysts. • Designed and developed mini-batch replication system, which replicates 1 TB data daily • Used Apache Spark to format over 5 TB data in HDFS based on downstream requests • Developed and supported a batch replicating system, which replicates 15 TB data daily between several heterogeneous databases
I am a teacher assistant of ITP-104 (Web Publishing), which is teaching HTML, CSS and Javascript. It also includes basic skill of using Dreamweaver and Photoshop. • Run open lab section weekly to offer technical help to students • Help 100+ students to solve their coding problems during lectures
I worked as a course producer of CSCI-201 (Principle of Software Engineering), which is mainly about multi-thread program design and GUI implementation in Java. • Held office hours weekly to offer technical help to students • Coordinated with 60+ students and 2 teacher assistants in lab sections
LODStories is to find interesting connections and paths between artworks or artists based on calculated features and machine learning techniques. The faulty supervisors of this project are Prof. Craig Knoblock and Dr. Pedro Szekely. • Implement server-side with Java Servlet under RESTful architecture • Configure Apache Tomcat to deploy the application on remote CentOS server • Use Apache Maven to automate software development life cycle • Design and implement linked data caching strategy by using MongoDB • Design and optimize SparQL queries to retrieve linked data from DBpedia