San Francisco Bay Area
SRE/HPC Engineer at NASA and Computer Science graduate at Boston University
-Spearhead system architecture, provisioning, and software deployment pipelines for the NASA Hyperwall Visualization System, ensuring high availability for mission-critical HPC environments and public demos. -Engineered robust automation and security tooling using Python, Go, and Bash, fully integrating Infrastructure as Code (IaC) and CI/CD methodologies to streamline deployments and enforce system reliability. -Architected and deployed a scalable, facility-wide observability stack using RHEL, Prometheus and Grafana, significantly improving system telemetry, proactive alerting, and incident response capabilities. -Partnered cross-functionally with scientific and engineering stakeholders, translating complex user requirements into optimized, scalable infrastructure solutions.
-Works under the NASA Advanced Supercomputing team on the Athena, Aitken, and Electra petascale supercomputers, as well as the NVIDIA Grace Hopper-based Cabeus GPU Supercluster and archival storage systems, implementing infrastructure as code.
-Took initiative to refactor Debian-based Linux automation scripts in Python and Bash for macOS compatibility, enabling a seamless team-wide platform migration and ensuring uninterrupted workflow. -Collaborates with engineers to diagnose bug symptoms and compile data on automated software and hardware testing clusters, ensuring rapid project delivery and enhancing team cohesion
-Enhances and optimizes high concurrency rack-mounted servers and large-scale testing infrastructure for Android, ChromeOS, Android Auto, and wearables. -Sets up and debugs terabit-scale networking including TCP/IP/DNS configuration. Manages 50,000+ device automation with tools like SSH and Ansible.
-Created and integrated a Lidar and live video feed processing chip simulation into QEMU using C and Assembly in a Linux environment, enhancing simulation accuracy and performance. -Assisted in migrating a Point Cloud visualization application code base from Electron to Chromium, resulting in compatibility improvements. -Conducted regression testing with Jenkins and Github, identifying and resolving build issues.