Greater Boston
I build infrastructure and platform products that make complex systems usable, governable, and reliable at scale. My background is in platform engineering, DevOps, observability, API platforms, and operational control systems. Across those domains, I’ve focused on reducing toil, improving reliability, and creating better developer experiences through automation, self-service platforms, and clear system feedback loops. I intentionally moved into AI infrastructure work rather than staying adjacent to it. Today, I work on platform-level control planes and agentic operational systems that bring reliability, compliance, cost, and governance signals directly into software delivery workflows. My focus is on the infrastructure that makes intelligent systems safe and effective in production: execution layers, control surfaces, observability, trust, and progressive automation. I work best on technically deep platform problems with high organizational leverage, especially when engineering, security, operations, and product strategy need to come together around a single coherent system.
Own product strategy for the compute and execution layer powering model training, deployment, and production inference. Define the roadmap for training and inference services, including elastic GPU/CPU compute, job orchestration, and high-availability serving infrastructure. Build self-service workflows that simplify model deployment and abstract away Kubernetes, cloud networking, and platform complexity for ML engineers and scientific teams. Lead the platform approach to lineage, reproducibility, and execution metadata, ensuring clear traceability between data versions, training code, models, and production artifacts. Drive observability and governance for AI systems, including latency, memory, reliability, drift, and performance signals needed for regulated environments. Partner with platform engineering, security, and enterprise stakeholders to integrate execution services with IAM, compliance, and broader platform controls.
Built Phloem, a framework for multi-step LLM orchestration with pipeline execution, state management, and evaluation loops. Focused on reliability, cost, and repeatability under real constraints.
Platform Engineering - Foundations Group. Responsible for foundational platform capabilities serving engineers across code storage, server imaging, patching, artifact management, and observability. Lead product strategy across five teams supporting a core internal platform used by 500+ engineers. Product lead for Sentinels, a platform-level control plane governing ~400 internal applications. Define the roadmap for an agentic SDLC control plane that embeds feedback loops directly into infrastructure and delivery workflows. Drive progressive automation using a trust-based model: observe -> recommend -> automate. Bring reliability, compliance, and cost signals earlier into the lifecycle to reduce late-stage surprises and manual intervention. Partner with platform engineering, security, and operations to reduce MTTR and cut operational toil by roughly 50%. Helped shape internal platform capabilities built on infrastructure-as-code, automation, and self-service delivery patterns.
Led DevOps and observability platform strategy supporting hundreds of services and more than 500k deployed devices. Built a centralized operational tooling that surfaced real-time signals across logs, metrics, traces, and deployment workflows. Simplified deployment and upgrade paths through automation, reducing operator burden and improving reliability. Improved release stability and developer velocity by strengthening platform workflows and operational feedback loops. Worked at the intersection of infrastructure, telemetry, and service health, building systems that made production behavior easier to understand and act on.
Led modernization of API and data platforms, delivering more than 100 next-generation REST APIs. Helped decompose a legacy monolith into service-oriented systems aligned with long-term platform evolution. Defined platform capabilities, service boundaries, and rollout patterns that reduced coupling and operational risk. Partnered closely with engineering on foundational backend and data-platform investments that improved scalability and delivery velocity.