Max Michaels

Site Reliability Engineer

New York City Metropolitan Area

About

Site Reliability Engineer and curious problem solver skilled in identifying areas for improvements within technology and processes, automate tasks, quickly learn new technologies, and deliver custom tools to simplify the day-to-day work of engineering teams. Effective communicator with strong writing and documentation capabilities.

Experience

  • Site Reliability Engineer at Netflix
    Apr 2024 - Present · 2 yrs 3 mos

  • Cloud Infrastructure Engineer at Grammarly
    Jun 2023 - Apr 2024 · 11 mos

  • Twitter (11 yrs 4 mos)
    • Staff SRE
      May 2012 - Jun 2023 · 11 yrs 2 mos

      - Built real-time data pipelines using Google Dataflow to enable teams to identify user traffic, errors, and site success rate in one-second intervals. - Wrote front-end tools in Python, React, and JavaScript to enable users to create dynamic charts and dashboards for specific KPIs. - Authored blog post When Seconds Really Do Matter for Twitter’s blog. - Developed a UI using Python, React, and PagerDuty API that enabled teams to communicate with service owners and broader teams when service issues required fetching on-calls. - Built several Slack chatbots to announce important service health changes (e.g. service degradations, traffic redistribution). - Wrote a chatbot to tie Slack channels to the PagerDuty API to enable users to simply mention an alias in Slack to mention the current on-call team members, resulting in a simple and repeatable process for summoning on-calls. - Created a web UI and CI-based framework for stress testing that enabled teams to target their services for synthetic traffic, avoid code performance regressions, and ensure capacity for high-traffic events. - Contributed to definition of KPIs, metrics, and best practices across teams for monitoring, running, and configuring services to ensure consistency and durability. Provided feedback for new service and feature launches based on established best practices. - Contributed to developing the incident postmortem processes and drove postmortems to identify root causes and mitigate issues. - Interviewed for 2016 Vice article Inside Twitter’s Command Center, Where the Threat Model Is Boy Band Tweets.

    • Linux Engineer, Twitter Command Center
      Mar 2012 - May 2022 · 10 yrs 3 mos

  • Director of System Operations at adMarketplace
    Nov 2010 - Feb 2012 · 1 yr 4 mos

    • Hired/trained/managed a team of five to allow for quicker growth.
 • Took a somewhat chaotic server environment of 200+ servers and streamlined the release, monitoring, installation process via automation tools (fabric,puppet) while building out the company's first self-managed data center installation. Managed all facets of activation from product/vendor selection and procurement to installation of hardware and network.
 • Implemented new data solutions for new products based on MongoDB and Vertica. These implementations allowed real-time insight to data that was previously unavailable

  • Systems Administrator (Consultant) at iVillage
    2010 - 2010 · Less than a year

    • Rebuilt a complex, aging environment with modern software principles. • Redesigned entire data serving architecture with a MySQL based master/slave architecture.