Jan Rous

Tinkerer, thinker, wrangler of machines

Boulder, Colorado, United States

About

* Planet-scale highly reliable systems: experience with running and maintaining reliability of these systems. * Distributed Large scale system design: designing robust and highly reliable distributed systems. Able to identify potential weak points, SPOFs and hot spots, cost-benefit/tradeoff analysis. * Debugging and problem solving: root-cause analysis of problem, damage control, incident response and management. * Production monitoring: experience with designing and maintaining monitoring systems for large scale distributed systems. Ability to design and implement robust monitoring system with low rate of noisy alerts. * Infrastructure engineering: building and maintaining cloud-based and distributed software systems, dealing with the full lifecycle (feature engineering, testing and validation, builds and deployment)

Experience

Senior Site Reliability Engineer at MongoDB
Jul 2025 - Present · 1 yr
Senior Software Engineer at RelationalAI
Mar 2021 - Feb 2023 · 2 yrs
Member of infrastructure team. Initiated and completed migration of entire production system to k8s, improved monitoring, handled numerous outages, participated in the post-mortem culture and implemented numerous performance and diagnostic features in the complex distributed system. Always focused on reducing the engineering friction
Through Hiking the Pacific Crest Trail at n/a
Apr 2019 - Sep 2019 · 6 mos
I have walked ~2500 miles of PCT from Mexican border to Canadian border in 140 days. I have marveled at the beauty of the wild places, slept in the woods and enjoyed life to the fullest.
Google (9 yrs 4 mos)
- Senior Software Engineer
  Jun 2017 - Apr 2019 · 1 yr 11 mos
  Search Infrastructure. Responsible for building and maintaining scalable user-facing content recommendation system. Other responsibilities include: incident response, monitoring, elimination and retirement of complex legacy systems, managing cross-team refactoring efforts, designing and implementing features for improved debugging and robustness, design reviews and promotion of knowledge sharing.
- Site Reliability Engineer
  Sep 2014 - Jun 2017 · 2 yrs 10 mos
  Member of Abuse SRE team in San Francisco. Responsible for running, maintaining and scaling systems that are responsible for detection of malicious content and activities on Google properties. Various responsibilities include: performance and scalability testing, weak point identification, maintaining monitoring, operations automation.
- Site Reliability Engineer
  Jan 2013 - Sep 2014 · 1 yr 9 mos
  Keeping critical infrastructure up and running. Responsible for scalability and reliability of large-scale low-latency distributed systems. Capacity planning and performance testing, developing tools and automation for various tasks. Member of Search Ads Serving SRE team.
Senior programmer at Koukaam
2005 - Jan 2010 · 5 yrs 1 mo
Design and implementation of "IPCorder" surveillance system running on embedded ARM and PowerPC devices.