Portola Valley, California, United States
Research scientist who has led teams and shipped at global scale: the vision-language systems behind Amazon Lens and visual navigation features in Amazon product search reach hundreds of millions of customers. I believe world-model-grounded, full-duplex interfaces are where computing is heading, and am working on the hardest problems to get there: multimodal reasoning, agentic retrieval, generative models, and real-time interfaces.
Led Lens Live as principal technical lead — real-time visual search integrated with Rufus for product insights, deployed to tens of millions of customers. Architected the mobile ML pipeline (on-device detection, tracking, and state management) enabling continuous low-latency product identification as customers scan; coordinated development across multiple VP orgs in four locations.
Led a team of up to 20 applied scientists and engineers; completed the transformation of Amazon Lens into a deep learning, embedding-based system matching against over 1 billion products, driving significant gains in CTR and usage. Launched Visual Search Suggestions, multimodal search (image + text reformulation), and More Like This — growing the latter from zero to over 90% of Amazon search result pages. Built and developed a high-performing research team across all levels; organized Amazon’s Computer Vision Conference (ACVC, 200+ attendees) and co-organized the Multimodal Representation and Retrieval workshop (MRR) at SIGIR and ICCV as lead organizer and advisor.
Led a team of up to 12 applied scientists and engineers; introduced deep learning to Amazon’s visual search stack from scratch — from first launch at ~100 categories, expanding to 100,000+ fine-grained search strings via query-log-trained models, then to instance-level matching against ~10M products — culminating in StyleSnap (2019) and StyleSnap for Home (2020).
Early applied scientist on A9’s visual search team; built A9 Flow (AR-based multi-object product recognition and tracking, initially a standalone demo app), with the underlying recognition, tracking, and scene text/OCR libraries later integrated into both the Amazon mobile app and the Amazon Fire Phone. Also developed video content alignment tools that automated annotation workflows for the IMDB X-Ray feature. Classical CV in high-performance C++.
Research scientist at an early-stage computer vision startup; built a video content search engine and developed multimedia summarization products — automated news article summarization and story clustering, and a video summarization API — using SVD of hand-crafted features.
Computer vision research with Prof. Hai Tao. Mostly in the area of pedestrian recognition.
Teaching Assistant for: CMPE 107 - Stochastics, Winter '06, Spring '09 CMPE 121/L - Micro System Design, Spring '06 CMPE 80A - Universal Access, Winter '08 AMS 27L - Engineering Math, Spring '08 CMPE 185 - Technical Writing, Winter '09 CMPE 16 - Discrete Math, Fall '09
Graduate student studying with a Regents fellowship.
Cellphone camera based wayfinding for the blind