Hui Huang

Performance Engineering

San Francisco Bay Area

About

Specialties: LLM inference performance, fleet resource efficiency, computer architecture.

Experience

Senior Staff Engineer at Google DeepMind
Dec 2025 - Present · 8 mos
Optimize Gemini inference performance on TPU.
Google (Full-time · 11 yrs)
- LLM Inference Performance Tech Lead
  Apr 2024 - Dec 2025 · 1 yr 9 mos
  Core inference performance engineer for Gemini 1.5 Flash, 2.0 Flash and 2.5 Pro.
- Uber Tech Lead
  Nov 2021 - Mar 2024 · 2 yrs 5 mos
  Leading Youtube Ads Data and Serving Infrastructure Resource Efficiency team.
- Staff Software Engineer, Tech Lead Manager
  Jan 2015 - Nov 2021 · 6 yrs 11 mos
  Running Cloud Platform Performance Team in Google that drives server/ ML workload efficiency and future hardware roadmap design for GCP.
Co-Founder at Falcon Computing Solutions
Jan 2014 - Dec 2014 · 1 yr
Co-founded Falcon Computing Solutions (FCS). FCS initially targets accelerator (GPU/FPGA) virtualization support in Cloud. Built cluster-level heterogeneous resource management framework based on Apache YARN and Spark.
Graduate Research Assistant at UCLA
Sep 2008 - Sep 2014 · 6 yrs 1 mo
Worked in the VAST (VLSI, Architecture, Synthesis and Technology) research lab, Computer Science Department, UCLA. Research interests include pattern mining for design optimization, compiler optimization on energy-efficient memory system and heterogeneous computing platforms.
Summer Intern at Intel Corporation
Jun 2012 - Sep 2012 · 4 mos
Summer intern at Intel Research Lab. Worked on improving flexibility of custom instruction set extension using pattern mining techniques.