Jan Rous

Tinkerer, thinker, wrangler of machines

Boulder, Colorado, United States

About

* Planet-scale highly reliable systems: experience with running and maintaining reliability of these systems. * Distributed Large scale system design: designing robust and highly reliable distributed systems. Able to identify potential weak points, SPOFs and hot spots, cost-benefit/tradeoff analysis. * Debugging and problem solving: root-cause analysis of problem, damage control, incident response and management. * Production monitoring: experience with designing and maintaining monitoring systems for large scale distributed systems. Ability to design and implement robust monitoring system with low rate of noisy alerts. * Infrastructure engineering: building and maintaining cloud-based and distributed software systems, dealing with the full lifecycle (feature engineering, testing and validation, builds and deployment)

Experience

  • Senior Site Reliability Engineer at MongoDB
    Jul 2025 - Present · 1 yr

  • Senior Software Engineer at RelationalAI
    Mar 2021 - Feb 2023 · 2 yrs

    Member of infrastructure team. Initiated and completed migration of entire production system to k8s, improved monitoring, handled numerous outages, participated in the post-mortem culture and implemented numerous performance and diagnostic features in the complex distributed system. Always focused on reducing the engineering friction

  • Through Hiking the Pacific Crest Trail at n/a
    Apr 2019 - Sep 2019 · 6 mos

    I have walked ~2500 miles of PCT from Mexican border to Canadian border in 140 days. I have marveled at the beauty of the wild places, slept in the woods and enjoyed life to the fullest.

  • Google (9 yrs 4 mos)
    • Senior Software Engineer
      Jun 2017 - Apr 2019 · 1 yr 11 mos

      Search Infrastructure. Responsible for building and maintaining scalable user-facing content recommendation system. Other responsibilities include: incident response, monitoring, elimination and retirement of complex legacy systems, managing cross-team refactoring efforts, designing and implementing features for improved debugging and robustness, design reviews and promotion of knowledge sharing.

    • Site Reliability Engineer
      Sep 2014 - Jun 2017 · 2 yrs 10 mos

      Member of Abuse SRE team in San Francisco. Responsible for running, maintaining and scaling systems that are responsible for detection of malicious content and activities on Google properties. Various responsibilities include: performance and scalability testing, weak point identification, maintaining monitoring, operations automation.

    • Site Reliability Engineer
      Jan 2013 - Sep 2014 · 1 yr 9 mos

      Keeping critical infrastructure up and running. Responsible for scalability and reliability of large-scale low-latency distributed systems. Capacity planning and performance testing, developing tools and automation for various tasks. Member of Search Ads Serving SRE team.

  • Senior programmer at Koukaam
    2005 - Jan 2010 · 5 yrs 1 mo

    Design and implementation of "IPCorder" surveillance system running on embedded ARM and PowerPC devices.