Micah Franklin

Cloud Architect | AWS & GCP | Kubernetes (EKS/GKE) | SRE | Splunk | Terraform | Multi-Account Cloud | Observability

Greater Chicago Area

About

I design enterprise-grade cloud architectures engineered for resilience, scalability, and operational excellence. As a Cloud Architect specializing in Site Reliability Engineering (SRE) and observability platforms, I build multi-account AWS and GCP environments that are secure, automated, and production-ready. My expertise includes: • Designing multi-region, highly available architectures • Implementing SRE frameworks (SLOs, SLIs, error budgets) • Building Kubernetes platforms (EKS/GKE) with autoscaling and network policies • Engineering observability stacks using Splunk, CloudWatch, and centralized logging pipelines • Infrastructure as Code with Terraform (modular, cross-account provisioning) • Implementing distributed system monitoring and incident response frameworks • Designing secure network architectures (VPC, Transit Gateway, VPN, hybrid connectivity) I focus on designing systems that are: • Failure-tolerant • Cost-optimized • Observable by default • Secure by architecture • Automated end-to-end I don’t just deploy cloud infrastructure, I design cloud platforms that withstand real-world production traffic and failure scenarios. Currently open to Cloud Architect or Platform Architect roles focused on reliability, observability, and cloud transformation initiatives.

Experience

  • Site Reliability Engineer at MOTM
    Sep 2022 - Present · 3 yrs 10 mos

    - Design and maintain scalable cloud infrastructure supporting high-traffic streaming services using AWS, Kubernetes, and containerized micro-services architectures. - Developed automation tooling using Python and Go to streamline deployment, monitoring, and operational workflows, reducing manual intervention and improving system reliability. - Implement distributed observability frameworks using Prometheus, Grafana, and centralized logging pipelines, enabling real-time visibility into system performance and service health. - Built automated alerting and incident response systems integrated with monitoring platforms, reducing mean time to detection (MTTD) and improving incident response efficiency. - Serve in a 24/7 on-call rotation, rapidly diagnosing and resolving high-severity production incidents across distributed microservices systems using PagerDuty. - Lead post-incident reviews and root cause analysis, identifying failure patterns and implementing long-term reliability improvements across critical systems. - Collaborate with engineering and product teams to integrate reliability, scalability, and security considerations into the full software development lifecycle. - Design and implement Infrastructure as Code (IaC) using Terraform, enabling consistent and repeatable cloud environment provisioning. - Analyze distributed system failure modes, including service dependency failures, network latency issues, and resource bottlenecks, improving platform resilience. - Develop and maintain runbooks and operational documentation, enabling faster incident response and knowledge sharing across teams. - Implement progressive delivery strategies (blue-green and canary) in Kubernetes using Argo CD and Argo Rollouts, enabling automated traffic shifting, real-time health analysis, and zero-downtime deployments with rapid rollback capabilities.

  • DevOps Engineer at R+L Global Logistics
    Feb 2018 - Jun 2022 · 4 yrs 5 mos

    - Developed cloud-based real-time shipment tracking platforms using AWS API Gateway and Amazon DynamoDB, enabling logistics teams and customers to monitor delivery status across nationwide transportation routes. - Built event-driven architectures using Amazon SNS and Amazon SQS, allowing logistics applications to process shipment updates and delivery notifications reliably at scale. - Engineered data ingestion pipelines using AWS Kinesis and Lambda, capturing and processing GPS telemetry from fleet vehicles to support route optimization and operational analytics. - Optimized cloud compute workloads by implementing container orchestration strategies within Kubernetes clusters, improving performance of logistics applications responsible for inventory and shipment management. - Implemented secure hybrid connectivity between on-premise warehouse systems and cloud infrastructure using AWS VPN and private networking, enabling seamless data exchange across logistics operations. - Designed automated infrastructure provisioning workflows using Terraform modules and Git-based workflows, allowing development teams to rapidly deploy consistent cloud environments for logistics services. - Integrated warehouse management systems (WMS) with cloud-hosted APIs, improving inventory visibility and enabling real-time synchronization between fulfillment centers and transportation platforms. - Developed cost optimization strategies using cloud usage monitoring and reserved capacity planning, reducing infrastructure costs for high-volume logistics workloads. - Implemented role-based access control (RBAC) and security policies across cloud environments to protect sensitive logistics and customer shipment data. - Partnered with data engineering teams to build analytics-ready data environments using cloud-based storage and processing frameworks, enabling leadership teams to analyze delivery performance and operational metrics.

  • Junior DevOps/AWS Cloud Engineer at Village of Park Forest
    Sep 2016 - Jan 2018 · 1 yr 5 mos

    - Supported and maintained multi-region AWS infrastructure (EC2, S3, RDS, Lambda), ensuring high availability and system reliability. - Assisted in deploying applications across environments using CI/CD pipelines, improving deployment consistency and reducing manual errors. - Managed cloud infrastructure using Terraform, enabling repeatable and scalable infrastructure provisioning. - Collaborated with development teams to support application releases across PHP, Java, and Python services. - Contributed to improving CI/CD workflows, optimizing build and deployment processes for faster release cycles. - Troubleshot infrastructure and application issues across distributed systems, improving system uptime and performance. - Automated routine operational tasks using Bash and Python scripts, improving efficiency and reducing manual intervention.