Micah Franklin

Cloud Architect | AWS & GCP | Kubernetes (EKS/GKE) | SRE | Splunk | Terraform | Multi-Account Cloud | Observability

Greater Chicago Area

About

I design enterprise-grade cloud architectures engineered for resilience, scalability, and operational excellence. As a Cloud Architect specializing in Site Reliability Engineering (SRE) and observability platforms, I build multi-account AWS and GCP environments that are secure, automated, and production-ready. My expertise includes: • Designing multi-region, highly available architectures • Implementing SRE frameworks (SLOs, SLIs, error budgets) • Building Kubernetes platforms (EKS/GKE) with autoscaling and network policies • Engineering observability stacks using Splunk, CloudWatch, and centralized logging pipelines • Infrastructure as Code with Terraform (modular, cross-account provisioning) • Implementing distributed system monitoring and incident response frameworks • Designing secure network architectures (VPC, Transit Gateway, VPN, hybrid connectivity) I focus on designing systems that are: • Failure-tolerant • Cost-optimized • Observable by default • Secure by architecture • Automated end-to-end I don’t just deploy cloud infrastructure, I design cloud platforms that withstand real-world production traffic and failure scenarios. Currently open to Cloud Architect or Platform Architect roles focused on reliability, observability, and cloud transformation initiatives.

Experience

Site Reliability Engineer at MOTM
Sep 2022 - Present · 3 yrs 10 mos
- Design and maintain scalable cloud infrastructure supporting high-traffic streaming services using AWS, Kubernetes, and containerized micro-services architectures. - Developed automation tooling using Python and Go to streamline deployment, monitoring, and operational workflows, reducing manual intervention and improving system reliability. - Implement distributed observability frameworks using Prometheus, Grafana, and centralized logging pipelines, enabling real-time visibility into system performance and service health. - Built automated alerting and incident response systems integrated with monitoring platforms, reducing mean time to detection (MTTD) and improving incident response efficiency. - Serve in a 24/7 on-call rotation, rapidly diagnosing and resolving high-severity production incidents across distributed microservices systems using PagerDuty. - Lead post-incident reviews and root cause analysis, identifying failure patterns and implementing long-term reliability improvements across critical systems. - Collaborate with engineering and product teams to integrate reliability, scalability, and security considerations into the full software development lifecycle. - Design and implement Infrastructure as Code (IaC) using Terraform, enabling consistent and repeatable cloud environment provisioning. - Analyze distributed system failure modes, including service dependency failures, network latency issues, and resource bottlenecks, improving platform resilience. - Develop and maintain runbooks and operational documentation, enabling faster incident response and knowledge sharing across teams. - Implement progressive delivery strategies (blue-green and canary) in Kubernetes using Argo CD and Argo Rollouts, enabling automated traffic shifting, real-time health analysis, and zero-downtime deployments with rapid rollback capabilities.
DevOps Engineer at R+L Global Logistics
Feb 2018 - Jun 2022 · 4 yrs 5 mos
- Developed cloud-based real-time shipment tracking platforms using AWS API Gateway and Amazon DynamoDB, enabling logistics teams and customers to monitor delivery status across nationwide transportation routes. - Built event-driven architectures using Amazon SNS and Amazon SQS, allowing logistics applications to process shipment updates and delivery notifications reliably at scale. - Engineered data ingestion pipelines using AWS Kinesis and Lambda, capturing and processing GPS telemetry from fleet vehicles to support route optimization and operational analytics. - Optimized cloud compute workloads by implementing container orchestration strategies within Kubernetes clusters, improving performance of logistics applications responsible for inventory and shipment management. - Implemented secure hybrid connectivity between on-premise warehouse systems and cloud infrastructure using AWS VPN and private networking, enabling seamless data exchange across logistics operations. - Designed automated infrastructure provisioning workflows using Terraform modules and Git-based workflows, allowing development teams to rapidly deploy consistent cloud environments for logistics services. - Integrated warehouse management systems (WMS) with cloud-hosted APIs, improving inventory visibility and enabling real-time synchronization between fulfillment centers and transportation platforms. - Developed cost optimization strategies using cloud usage monitoring and reserved capacity planning, reducing infrastructure costs for high-volume logistics workloads. - Implemented role-based access control (RBAC) and security policies across cloud environments to protect sensitive logistics and customer shipment data. - Partnered with data engineering teams to build analytics-ready data environments using cloud-based storage and processing frameworks, enabling leadership teams to analyze delivery performance and operational metrics.
Junior DevOps/AWS Cloud Engineer at Village of Park Forest
Sep 2016 - Jan 2018 · 1 yr 5 mos
- Supported and maintained multi-region AWS infrastructure (EC2, S3, RDS, Lambda), ensuring high availability and system reliability. - Assisted in deploying applications across environments using CI/CD pipelines, improving deployment consistency and reducing manual errors. - Managed cloud infrastructure using Terraform, enabling repeatable and scalable infrastructure provisioning. - Collaborated with development teams to support application releases across PHP, Java, and Python services. - Contributed to improving CI/CD workflows, optimizing build and deployment processes for faster release cycles. - Troubleshot infrastructure and application issues across distributed systems, improving system uptime and performance. - Automated routine operational tasks using Bash and Python scripts, improving efficiency and reducing manual intervention.