United States
Engineering leader with more than 15 years of experience building large-scale machine learning infrastructure and production AI systems. I have worked across Microsoft, Google, and OpenAI focusing on distributed systems, ML platforms, and large-scale model training and inference infrastructure. My work has primarily centered on translating research progress into reliable, production-ready systems used by millions of users. I’m particularly interested in the intersection of platform engineering, large-scale compute infrastructure, and the operational challenges of deploying AI systems at scale. I enjoy building strong engineering teams and creating platforms that enable researchers and product teams to move faster.
Led engineering efforts focused on large-scale ML infrastructure supporting model training and inference workloads. Worked closely with research and applied teams to improve the reliability and scalability of internal ML platforms, enabling faster experimentation and more efficient deployment of production models. Managed multiple engineering teams responsible for distributed training systems, model serving infrastructure, and platform reliability in GPU-accelerated environments.
Managed a team of machine learning engineers building production-grade ML systems used across large-scale services. Responsible for system architecture, platform reliability, and operational scalability of ML infrastructure. Partnered with product and research teams to integrate machine learning capabilities into core products while maintaining strong engineering standards.
Designed and implemented machine learning components used in large-scale production systems. Contributed to system architecture, model deployment pipelines, and performance optimization across distributed services. Mentored junior engineers and helped improve engineering practices across the team.
Worked on data processing pipelines and early machine learning systems supporting large-scale services. Participated in model development, evaluation, and deployment in production environments. Collaborated with engineering teams to improve system stability and operational performance.