Miguel David

Engineering Manager | SRE & Product Teams | $240K/yr Cloud Savings | 99.95% Uptime | Led Teams Through Acquisition | People + Systems + Outcomes

Brussels, Brussels Region, Belgium

About

Engineering Manager who led 2 distributed teams (SRE + Product) at Harvest through acquisition. Recent impact: → Directed 3-year zero-downtime Vitess migration (7.7K QPS, peaks 12K) → $240K/year cloud cost savings through systematic optimization → 99.95% uptime across 11 Kubernetes clusters, 144+ nodes → Stabilized $3.6M ARR product and reduced churn from 4% to 1.5% → Promoted 2 engineers; built async-first culture across multiple time zones and continents I manage through clarity, curiosity, and care. I make technical work visible, translate across functions, and connect the dots early to keep people aligned. I'm pragmatic, action-oriented, and serious about sustainable pace as a core enabler of long-term performance. Core beliefs: - Continuous learning is baseline, not optional - Psychological safety drives better results - Impact matters, but always with people in mind Open to: Engineering Manager roles (SRE, Platform, Product Engineering) — remote or Brussels-based. Specialties: Kubernetes, Vitess, GCP, Terraform, distributed systems, incident management, cost optimization, engineering leadership

Experience

Engineering Manager at Buffer
Apr 2026 - Present · 3 mos
Harvest (Remote)
- Engineering Manager, Site Reliability Engineering & Forecast Teams
  Jan 2023 - Sep 2025 · 2 yrs 9 mos
  ** Harvest was acquired in July 2025 ** Leading two globally distributed engineering teams (8 engineers total) across Site Reliability Engineering (SRE) and Product (Forecast), supporting business-critical infrastructure and product development for a platform serving 65,000+ seats and generating over $3.6M Annual Recurring Revenue (ARR). Key Contributions: * Led a 3-year, zero-downtime migration to a horizontally scalable Vitess database architecture across critical applications. * Reduced monthly infrastructure costs by $240K/annually: GKE labelling, Kubecost, GCS lifecycle rules, CUDs. * Led and mentored a global team of 8 engineers (Brazil, EU, US), fostering a high-trust, high-autonomy environment that achieved top team engagement scores. * Coached and facilitated the promotion of 2 engineers (to Senior SRE and SWE II), supporting their career growth and public contributions. * Maintained 99.95%+ uptime across all customer-facing applications, consistently exceeding the 99.90% Service Level Objective (SLO). * Achieved a 78% proactive detection rate for production incidents, and a median Mean Time To Resolution (MTTR) of under 35 minutes. * Coordinated critical incident responses, including a multiservice outage recovery (Feb 2025) and a large DDoS attack response (Jan 2024). * Managed and optimized an infrastructure comprising 11 Google Kubernetes Engine (GKE) clusters, 144+ nodes, 1,746+ CPUs, and 8+ TiB RAM. * Stabilized Forecast product after multi-year MRR decline, maintaining revenue at ~$310K/month and reducing churn from ~4% to ~1.5%.
- Site Reliability Engineer
  May 2019 - Jan 2023 · 3 yrs 9 mos
  Key Contributions: * Core contributor to the complete infrastructure migration from bare metal to Google Kubernetes Engine (GKE) across 5 production applications with zero downtime. * Drove substantial cloud cost optimization, implementing KEDA-based intelligent autoscaling with custom Prometheus metrics which achieved over 60% compute cost reduction for core services (61% for Harvestapp and 6% for Forecast), replacing inefficient CPU-based HPAs. * Designed and deployed high-availability Redis and Memcached architectures in GKE, replacing single points of failure (like Heroku Redis) with a 3-pod GKE deployment and sentinel configuration, achieving 99.9% uptime for Redis. * Co-built a comprehensive Service Level Objective (SLO) monitoring framework using Prometheus, Stackdriver, Grafana, and Pingdom, and architected a full observability stack with Elasticsearch for logs and distributed tracing. This foundation enabled 99.95%+ uptime across all services and supported p99 47.5ms query times at 10K MySQL QPS. * Managed critical security operations, including zero-disruption secrets rotation across 50+ services and establishing controlled production access for support teams. * Streamlined development workflows by building robust CI/CD pipelines using Buildkite with Docker layer caching and automated testing, alongside implementing Dependabot for automated dependency management across Ruby, Rails, and Node.js ecosystems.
Co-organizer at DevOps Porto Community
Oct 2016 - Feb 2024 · 7 yrs 5 mos
Co-organizing monthly meetups for people interested in the DevOps movement.
Co-Founder at Taste Porto
May 2013 - Sep 2019 · 6 yrs 5 mos
Co-founder of a Food Tour in Porto, Portugal, where I was born and raised. http://www.tasteporto.com
Linux, Cloud and Automation Engineer with a DevOps focus at Curious Ellie
Jan 2016 - May 2019 · 3 yrs 5 mos
Designed infrastructure, led cloud migrations, and automated CI/CD for startups across travel, news, fashion, and supply chain sectors. Key clients: Nezasa (travel tech), WikiTribune (Jimmy Wales' news platform), Chic by Choice (fashion marketplace), Foursource (B2B supply chain), LTPLabs (big data consulting), ViGIE Solutions (medical sensors) Stack: Terraform, Ansible, Packer, AWS, GCP, Kubernetes, PostgreSQL, GitLab