Ireland
An experienced Unix-adjacent Site Reliability Engineer with a long history of designing, maintaining and debugging distributed systems of a planetary size. Having an interest in all things related to monitoring of production systems, enabling organisations to make informed data-driven decisions.
I was taking care of the world’s largest (>10EB, ~5Bqps) key-value store, used by multiple Google services. - Ensured strict enforcement of data deletion deadlines for GDPR compliance, leading the project, defining technical constraints based on the legal requirements, implementing monitoring. - Re-wrote the monitoring stack used for operational and analytical work of both the team and Bigtable customers. - Improved accuracy and observability of the backup facilities of Bigtable used by internal and external Bigtable customers.
I was involved in long-term activities spanning the entire organization. - Educated all new SRE hires in EU on monitoring philosophy and systems, providing 1/1 consultations for the same, worked on Site Reliability Engineering book. - Recruited SRE team members, performing interviews (120+), evaluating candidates and calibrating other interviewers. - Mentored colleagues in both technical and soft skills.
I've maintained and developed Googlebot, a system used by all of Google systems to fetch web pages (100s of Gb/s fetch throughput) for web indexing or other purposes. - Migrated underlying storage system from GFS to Colossus - this was done in tandem with the identical migration performed for the logs environment (see below), with the dataset size of 100s of PB. - Ensured system stability and availability by rewriting a legacy launch-framework to a modern declarative environment. - Increased reliability of an N+0.5 service despite physical constraints, migrated it away from custom hardware to standard Google production hardware.
I’ve started the technical support team, responsible for a 24/7 test environments used by a group of 200+ telecom software developers. The scope of responsibilities changed rapidly over the years, as I became a go-to person for any and all Unix expertise in the regional office. Significant achievements: - Developede streamlined test hardware configurations, allowing for rapid iterations between different versions of tested software. - Bootstrapped a third level technical support team handling external customer queries. - Feveloped curriculum and performed training for the external customers’ technical staff. - Hired and trained 7 people for my team as the scope and area of responsibilities grew.
I've set up and administered a Debian-based community server for other students.