Greater Chicago Area
11 yoE Senior sysadmin/infra engineer/devOps -- Spent 8 years smashing atoms at Fermilab, now working on distributed FinTech systems at DRW
> L3 Project Manager for Kubernetes Compute Infrastructure @ ACORN (Accelerator Controls Research Network) > Kubernetes Architect and lead cluster administrator for AD/Accelerator Controls (SRE oriented) > Service lead for Harbor Container Registry, servicing the entire Accelerator Division at Fermilab > Service lead for Keycloak IAM for Accelerator Controls > Technical lead for Applications at Fermilab's Elastic Analysis Facility. A JupyterHub and Kubernetes-based scientific analysis platform for High Energy, Accelerator and Astro physics
Computing Services Infrastructure operations and development: > SRE-Sysadmin for the CMS Tier 1, Neutrino/Frontier, HPC and opportunistic compute farms at Fermilab powered by HTCondor distributed batch system > HEPCloud project operations, research and development. Theta supercomputer integration. > Operate the US/FNAL hosted Submission Infrastructure services on the CMS Global Pool, a GlideIn based distributed computing pool of over 250k cores. > Service developer/admin for various Kubernetes grid services at Fermilab including HTCondor worker nodes on demand, specialized Spark clusters and JupyterHub. > Infrastructure as code Kubernetes developer/admin for the LHC Big Data Project, providing industry-level columnar analysis platforms for HEP like Apache Spark, COFFEA, Dask, Parsl and others. > JupyterHub@Kubernetes service administrator and support for Fermilab users > AILab (Artificial Inteligence Laboratory) computing support, deployment of hybrid infrastructure (GPU, FPGAs, CPUs) on baremetal machines, Docker standalone containers and cloud engines (AWS + GCP) > Monitoring several services via ElasticSearch, Prometheus, Graphite, InfluxDB and other backends along with Grafana and Kibana dashboards > Working on R&D initiatives on next generation computing for HEP
> Care and operate the globally distributed computing infrastructure of CERN CMS experiment composed out of more than 70 sites running >200k cores on more than 20 countries. > Monitor sites performance from different angles: Data transfers, production jobs, independent functionality tests, load and performance tests. > Commissioning and decommissioning of computing sites globally distributed to become part of CMS and activate them for production. > Assist site system administrators in debugging and solving infrastructure/software issues effectively in order to reduce out-of-production time. > Prepare plots, summaries and reports for weekly meetings to keep other teams and liaisons informed about the behavior and performance of the sites. > Understand, debug and analyze containerized physics jobs in Singularity and Docker > Collaborate in full stack monitoring projects with Graphite, ELK (Elastic + LogStash + Kibana) and TICK (Telegraf + InfluxDB + Chronograf + Kapacitor) > Train and prepare site sysadmins in notifying, detecting and tracing security breaches/incidents in the USCMS VO. GRID technologies: Scientific Linux (6-7), HTCondor for high throughput computing (HTC) and GlideInWMS, Hadoop and VC3 for HPC, data transfers protocols (GridFTP, SRM, XrootD), data transfers infrastructure (PhEDEx & FTS) and FroNtier/squid servers
> Manager and maintainer of virtual environment and infrastructure using VMWare Virtual Center Server v6.1.7 with a mixture of ESXi 5.7-ESXi 6.0 hypervisors. > Supporting 5 enclosures, 18 servers each for a total of 90 physical servers which hosted 1350 virtual machines (Windows 7, 8, 10, Server 2008, 2012, 2016, Ubuntu 12.04 - 16.04, CentOS 6-7, Debian 6-Squeeze, 7-Wheezy, 8-Jessie, Fedora 23/26). > Private cloud project leader and architect. System was fully implemented. > Management, allocation and load balancing of host/datastore and VM resources. > Proposed and implemented infrastructure tuning and enhancements according to changing needs > Incident and user request response and management under ITIL model + ticketing system. > Manage and distribute 12 VLANs (8 private, 3 public) via IPAM as information system. > DHCP server administrator for dynamic IP address assignation and DNS administrator (Windows server 2012 DNS, then migrated to BlueCat DNS). > Administrator and maintainer of remote access services (Microsoft Remote Gateway for Windows RDP and Squid rpoxy for Linux SSH tunneling). > Virtual Machine OS installation, configuration, administration, patching, and monitoring. > Automated administration tasks developing custom made VMWare Orchestrator workflows in a combination of Javascript, Bash/Shell scripts, PowerShell and Batch scripts and VMWare PowerCli scripts for easily cloning, creating and deleting VMs. > Designed and implemented virtual infrastructure backup policies and automated periodic snapshots and machine clones to avoid loss of information due to hardware failures.
> Laboratory teaching assistant for Object Oriented Programming and Algorithms I and II (Java introductory courses) for 4 bachelor years > Laboratory teaching and project jury for Data Structures (Java advanced course) for 2 years > Laboratory teaching and project jury for Mobile Application Software Development (Android, iOS advanced course) during my last semester > Member of the Women in Computing group and GeekGirlsLATAM
> Teaching assistant and project jury for Enterprise Architecture
> Oracle Weblogic SOA Suite/OAS junior administrator for Telefonica and Avianca. Incident solving, client requests and overall administration and tuning of middleware infrastructure under ITIL model. > Developed a Java servlet based frontend for port and services monitoring at the SOA as well as IBM WebSphere and IIS servers > Coordinate and manage workforce distribution within WME group, coordinate middleware changes requested by client, manager of MantisBug tracker and stand by.