Hui Huang

Performance Engineering

San Francisco Bay Area

About

Specialties: LLM inference performance, fleet resource efficiency, computer architecture.

Experience

  • Senior Staff Engineer at Google DeepMind
    Dec 2025 - Present · 8 mos

    Optimize Gemini inference performance on TPU.

  • Google (Full-time · 11 yrs)
    • LLM Inference Performance Tech Lead
      Apr 2024 - Dec 2025 · 1 yr 9 mos

      Core inference performance engineer for Gemini 1.5 Flash, 2.0 Flash and 2.5 Pro.

    • Uber Tech Lead
      Nov 2021 - Mar 2024 · 2 yrs 5 mos

      Leading Youtube Ads Data and Serving Infrastructure Resource Efficiency team.

    • Staff Software Engineer, Tech Lead Manager
      Jan 2015 - Nov 2021 · 6 yrs 11 mos

      Running Cloud Platform Performance Team in Google that drives server/ ML workload efficiency and future hardware roadmap design for GCP.

  • Co-Founder at Falcon Computing Solutions
    Jan 2014 - Dec 2014 · 1 yr

    Co-founded Falcon Computing Solutions (FCS). FCS initially targets accelerator (GPU/FPGA) virtualization support in Cloud. Built cluster-level heterogeneous resource management framework based on Apache YARN and Spark.

  • Graduate Research Assistant at UCLA
    Sep 2008 - Sep 2014 · 6 yrs 1 mo

    Worked in the VAST (VLSI, Architecture, Synthesis and Technology) research lab, Computer Science Department, UCLA. Research interests include pattern mining for design optimization, compiler optimization on energy-efficient memory system and heterogeneous computing platforms.

  • Summer Intern at Intel Corporation
    Jun 2012 - Sep 2012 · 4 mos

    Summer intern at Intel Research Lab. Worked on improving flexibility of custom instruction set extension using pattern mining techniques.