San Francisco, California, United States
I have over 20 years of professional experience as a software developer. I studied database algorithms for my PhD and then spent several years working with biological data in two postdoctoral positions. Afterwards, I moved into biotech where I learned the latest in cloud computing, microservice architectures, API development and collaborative software development, such as Agile. I most recently worked at Google applying my database and cloud skills.
Worked as part of a team on a PostgreSQL extension to extract database metrics. Created an algorithm to efficiently extract only the metrics for long running queries when the system is under load. Led the design and implementation for the system to transfer the metrics from read replicas. The solution used containerized agents to manage the data and send it to the primary for writing. The work included a partition manager for the replica data to efficiently purge expired data.
Built and maintained AWS micro-services to support accessioning biosamples, report generation and postlab processing. Older services were container based (ECS) while newer ones used an API-Gateway/Lambda pattern. Queues (SQS) and messaging (SNS) were used for inter-service communication as well. DynamoDB, relational DB (Postgres or MySQL), and S3 were used for data storage.
Supported geneticists by creating web-based tools to improve their workflow and collecting and managing relevant biological datasets for efficient access. The main app used Python/Django with some React components. Back-end support used PostgreSQL, containers (ECS), and ElasticSearch running in AWS. A Luigi pipeline was used for input processing.
Developer on the GA4GH reference server project and team lead of the Toil pipeline management system (containerized workflows on high-performance computing clusters and in the cloud.).