Danville, California, United States
Creating quality and efficiency through data-driven software architecture, engineering, and automation
Created a new API Backend to aggregate and report on company-wide cloud infrastructure spending. Develop custom API endpoints to serve data to front-end services. Analyze and create database tables to efficiently track usage, spend, and ownership of cloud applications over time. This application replaces a legacy system that contained outdated and inaccurate historical data, sourcing current usage and spend data from a multitude of primary sources such as AWS and Azure and standardizing format and reporting using Airflow, FastAPI, SQL Server, and Azure. Standardize secure authentication and authorization, input validation, error handling, rate limits, logging and alerting, and data privacy and security. Mentor team members in API development best practices, created guidelines for secure and scalable data ingestion, processing, and web services. Work with developers of downstream systems to smoothly transition to the new API. Support data engineers with code and database automation in data ingestion and processing. Configure firewall and Nginx services to enable Airflow and FastAPI to run on a single host. Created daily system monitoring to alert on outages and system performance. (Python, FastAPI, Airflow, SQL Server)
Lead a team of 3-5 engineers in developing and deploying the backend treasury system as an immutable record of transactions built on the AWS Quantum Ledger Database. Took technical leadership of a fledgling service and drove it to production by prioritizing core features and devops integration. Revamped Reporting architecture to asynchronously write results into S3 and leverage S3Query instead of continuously querying large sets of data from production databases. This abstraction layer enables a highly interactive UI that does not impact the transaction database when dynamically filtering and displaying results. Worked across teams to improve architectural designs across teams to simplify and focus on providing the highest quality service for our customer’s primary needs including API interface simplification, bug triage, test infrastructure, devops integration, and external developer API documentation.
Led a redesign of the Hum data processing backend that reduced projected implementation from 6 months to 2 months. Simplified the auto-labeling data framework from ingestion to processing to embed generated data directly into the JSON schema for downstream consumption. This design largely eliminates the need to normalize and denormalize all input into a SQL database and instead moves towards a NoSQL document store. Provided training and mentoring in NoSQL design patterns, improved bug triage and handling, designed better pull request review and approval flows.
As the primary software engineer on the Cloud Security team, I took over an application that filed tickets according to vulnerabilities found by Snyk, extended the application beyond just filing bugs to managing the ticket lifecycle for engineering teams by detecting and reconciling changes from both engineers and the Snyk scans, and reduced unnecessary manual interventions from tens of tickets per day to less than 10 per month, also enabling engineers to trust that tickets opened for them are valid. Ported the application to Kubernetes/Docker to move it off of a centralized shared server which reduced risk and improved code deployments from a week+ to 1 day. Interfaced with engineering teams and enabled the security team to work with engineering teams to understand during project planning and scoping to avoid vulnerabilities before they are introduced into new code. Mentored members of the Security team in good software engineering practices through pair programming and code reviews.
Enhanced API driven automation frameworks enabling the Maps ML data scientists to run their workloads and data ingestion jobs. This application enabled users to abstractly run jobs through Airflow on multiple Spark installations and also define their own custom runtime containers while seamlessly chaining job segments together across any of these platforms. Adapted job configuration to validate input parameters such that most errors are detected before running the job. Increase test automation and coverage to enable quick development cycles while maintaining high quality in our services. Developed API enhancements to search for and download log files from Amazon S3. Added YAML config file input to API methods in addition to existing JSON payloads. Developed tools to efficiently manage file retention to stay under hdfs space and filecount quotas.