New York City Metropolitan Area
I'm a Lead Site Reliability Engineer at MongoDB, where I've spent the last 8 years scaling our observability infrastructure to handle 2 billion active time series and process 100TB+ of telemetry data daily across AWS, GCP, and Azure. Most recently, I architected a unified telemetry pipeline that reduced our observability infrastructure costs by 80% while significantly improving data quality and reliability. I've built and mentored engineering teams, growing our observability group from 4 to 8 engineers and championing multiple promotions to Staff Engineer level. Before MongoDB, I spent two years managing Columbia University's 30-rack private data center, where I first implemented large-scale observability using the ELK Stack across 200+ servers. That experience taught me the fundamentals of infrastructure operations and sparked my focus on making complex distributed systems observable and reliable. TECHNICAL EXPERTISE Observability: Prometheus, VictoriaMetrics, Thanos, Grafana, ELK Stack, OpenTelemetry, Jaeger, distributed tracing Infrastructure: Kubernetes, Docker, Terraform, AWS/GCP/Azure, Linux systems, Kafka, Envoy Proxy Languages: Go, Python, C, Bash SRE Practices: SLI/SLO frameworks, incident management, on-call rotation design, telemetry pipeline architecture Let's connect if you're building something interesting in the observability or infrastructure space.
Lead for the Observability team, interm lead for the Fabric (networking) team
Site Reliability Engineer with a focus on observability and edge load balancing.