San Francisco Bay Area
Specialties: LLM inference performance, fleet resource efficiency, computer architecture.
Optimize Gemini inference performance on TPU.
Core inference performance engineer for Gemini 1.5 Flash, 2.0 Flash and 2.5 Pro.
Leading Youtube Ads Data and Serving Infrastructure Resource Efficiency team.
Running Cloud Platform Performance Team in Google that drives server/ ML workload efficiency and future hardware roadmap design for GCP.
Co-founded Falcon Computing Solutions (FCS). FCS initially targets accelerator (GPU/FPGA) virtualization support in Cloud. Built cluster-level heterogeneous resource management framework based on Apache YARN and Spark.
Worked in the VAST (VLSI, Architecture, Synthesis and Technology) research lab, Computer Science Department, UCLA. Research interests include pattern mining for design optimization, compiler optimization on energy-efficient memory system and heterogeneous computing platforms.
Summer intern at Intel Research Lab. Worked on improving flexibility of custom instruction set extension using pattern mining techniques.