Mountain View, California, United States
Vertex AI - Gen AI Serving - Optimize inference and serving performance of open source models by adopting industry leading techniques such as spec decoding, overlap scheduling and others. - Lead cross-company collaboration with a key external partner for TensorRT-LLM and other latest GPU serving technology. - SGLang contributor - design and work with SGLang community to implement and enhance priority based traffic management (priority scheduling, batching, and concurrency control); https://github.com/sgl-project/sglang/issues/13526
Google Cloud Data Analytics: - Led the design and implementation for dependency resource management and garbage collection for cloud resource in BigQuery Engine for Apache Flink.
Google Cloud Data Analytics: - As part of the launch team for BigQuery Engine for Apache Flink, implemented control plan support for Kubernetes based Flink deployments. - Added API support for at-least-once exactness mode in Cloud Dataflow and implemented tooling to analyze cost and latency tradeoffs in streaming use case. - Integrated NVIDIA Multi-Process Service (MPS) in Cloud Dataflow for batch inference, yielding 1.33x performance gain in throughput.
Prototyped a geosocial mobile app (Android and iOS) to connect tutors with their prospective students using Flutter and GCP backend.
Developed incident response framework for AWS infrastructure against security threats, with infrastructure-as-code (IaC).
Co-founded a company with an aim to introduce up-and-coming designers of Asia to the US market.