San Jose, California, United States
Senior research engineer, technical leader and architect. I design and build enterprise Machine Learning infrastructure at scale, with a focus on performance, concurrency, resiliency, automation and consumability. Expertise spans LLM inferencing and distributed systems – Cloud, Linux, containers, Kubernetes, serverless, microservices, K/V stores, networking, PyTorch. Skilled in Python, Rust, Java, Go. Prolific open source contributor. More than 15 years’ experience leading development and performance teams. Consistent track record of major technical contributions.
• Built the back-end LLM inferencing platform that was released as the flagship watsonx.ai Gen-AI cloud service in July 2023, lauded by the CEO and head of Research as unprecedented in terms of Research time-to-market. • Deployed and supported highly successfully IBM-internal LLM inferencing service used by thousands of IBMers as a vehicle for other research and services. • Early significant contributor to the first and most prominent open source LLM inferencing engines - Hugging Face TGI and vLLM. • vLLM committer. Represent IBM in the vLLM community, supporting and coordinating contributions by others. • Drive collaboration across IBM Research in support of common goals - bridging efforts/workstreams and connecting the dots between different projects and teams. • Work as a conduit between Research and product teams in IBM and Red Hat; assisted with early Gen AI customer engagements. • Lead ongoing Gen AI inference engineering and optimization efforts - setting technical strategy, educating and enabling other team members.
Conceived, designed and implemented much of the infrastructure underpinning IBM’s most successful AI Cloud offerings including IBM Watson Assistant and Natural Language Classifier services. Lead and mentor development teams. Build and support distributed, scalable, resilient production systems and components/libraries used within them. Consistent top performer and regular award recipient. • Designed and implemented back end Kubernetes-based engine for orchestrating self-service ML model lifecycles used by Watson Assistant, Natural Language Classifier and Discovery services – co-ordinates model training, retraining and deployment in a fault-tolerant manner. • Designed and implemented general-purpose mesh-based model serving platform now central to most Watson AI cloud services – manages hundreds of thousands of production models. • Generalized components with abstractions to enable seamless migration between changing underlying platform services – container scheduler, data stores, etc. • Create and maintain various low-level libraries used across multiple Watson cloud products – including a service discovery and RPC framework, java etcd client and utilities.
Taking IBM Watson from gameshow to product. Applied performance analysis and engineering to parallelize and shrink room-sized question-answering system down to a single server, taking everyone by surprise and driving a fundamental change in architectural direction. Created high-performance concurrent runtime library for Apache UIMA NLP pipelines, onto which all commercial Watson QA pipelines were based. Re-implemented the core Apache UIMA 2.x library internals with a new approach that became the basis for UIMA 3.0.
One of eight students chosen nationally, working in a team of four on research-based software project: an autonomic knowledge-management system based on Semantic Web concepts. Filed three patents.
Software Engineer in small team integrating Voice over IP into an existing telephony product. Personally took over coding of a significant proportion of the software (in C). Chosen for IBM student Bursary scheme. Returned for subsequent summer internship (Jul 2002 - Sep 2002) – made responsible for the design and development of new SIP (Session Initiation Protocol) functionality. Liaised with vendors while evaluating their protocol stacks and delivered a full solution with one of the offerings.