Boston, Massachusetts, United States
Scaling traffic infrastructure @ Temporal Cloud
On the Hail (hail.is) team, I contributed to an open source analytical database system for petabyte-scale genomic data. Specifically, I worked on the compute infrastructure to provide scientists with a serverless, multi-tenant platform for analysis that is fast, cheap and easy to use. Some of the fun projects I got to work on include: • Digging into container runtime internals to achieve <80ms container startup times for interactive pipeline prototyping • Developing infrastructure as code to allow anyone to deploy the Hail system on both GCP and Azure • Adding a monitoring stack and establishing practices for performance work on the system • Integrating the Hail system into data platforms like Terra for easier deployment and broader adoption • Mentoring co-op students and computational biologists in contributing to our project at all levels of the stack, from monitoring and performance to adding GPUs
• Contributed Online Certificate Status Protocol (OCSP) Stapling to Envoy, an open source proxy and load balancer, to improve security for Slack client connections and the broader Envoy community • Developed an Envoy filter to detect and standardize error responses across the Slack architecture
• Developed efficient algorithms for processing genetic variation data in the open source toolkit, tskit, that leverages tree sequences to infer species trees at record speeds • Optimized low-level routines in the genetic simulator msprime • Conducted performance experiments against state-of-the-art genetic simulators for use in the msprime 1.0 paper
• Developed distributed linear algebra infrastructure in Python/Scala/C++ for the hail compiler to enable scalable machine learning in genomic research and other big data applications • Implemented efficient compilation strategies in Scala/C++ to improve distributed query engine performance • Interfaced with computational biologists for user support and planning of new features