Ali Alper Ak

Principal DevOps Engineer | Kubestronaut | CKS | CKA | CKAD | AWS CSAA

Berlin Metropolitan Area

About

I am a Principal DevOps / Site Reliability Engineer with AWS and Google Cloud hands-on experience. I am certified in AWS and Kubernetes (CKA - CKS). I always work with security first mindset. I worked in PCI DSS and SOC2 Type II certified environments (also during certification progress). I am specially interested in observability and monitoring and I love to automate everything.

Experience

Entrust (Berlin, Germany)
- Principal Cloud DevOps Engineer
  Sep 2025 - Present · 10 mos
  As Principal Cloud DevOps Engineer and Platform Tech Lead at Entrust my primary responsibilities are: -Architect and champion the platform vision, shaping scalable, secure, and reliable cloud infrastructure across enterprise environments. -Lead the development and execution of DevOps strategies, harnessing automation, CI/CD pipelines, and infrastructure-as-code to drive accelerated delivery and operational excellence. -Collaborate with cross-functional teams to deliver robust cloud-native solutions aligned with strategic business goals, fostering innovation and best practices in architecture, security, and reliability.
- Staff Cloud DevOps Engineer
  Aug 2024 - Sep 2025 · 1 yr 2 mos
  Following Entrust’s acquisition of Onfido, I transitioned into the role of Staff Cloud DevOps Engineer at Entrust, where I also serve as Platform Tech Lead. - Lead platform architecture and DevOps strategy to ensure scalability, security, and reliability across cloud environments. - Drive automation, CI/CD pipelines, and infrastructure-as-code practices to accelerate delivery and improve operational efficiency. - Collaborate with cross-functional teams to design and implement cloud-native solutions aligned with business objectives.
Onfido (Berlin, Germany)
- Staff DevOps Engineer
  Mar 2024 - Aug 2024 · 6 mos
- Senior DevOps Engineer
  Apr 2023 - Mar 2024 · 1 yr
  - Participant in the regular on call rotation - Designing and implementing various parts of platform repave initiative - Upgrading very old Postgres DBs to latest version and splitting shared DBs which both enabled yearly savings upto 500k and increased reliability of services - Getting rid of lots of legacy infrastructure parts/tools and moving to latest opensource technologies - Implementing GitOps with Flux and migrating all infra tools to GitOps in all EKS clusters - Also I have been granted Onfido Value award for "find a better way" as recognition of all my work on infrastructure.
Senior Site Reliability Engineer at Trendyol Group
Dec 2021 - Apr 2023 · 1 yr 5 mos
- Member of Core SRE Team - Participant in the regular on call rotation - Migration of Dolap (Trendyol's c2c - secondhand marketplace) from AWS ECS to EKS. End to end IaC/terraform orchestrated and implemented GitOps via Flux v2. - AWS Cost optimization - Cost and Capacity Management: Creating a product for calculating on-premise data center costs and capacity management (running across 7 DCs with ~7K physical hosts and ~30K VMs) for tribes, teams and workloads. - Service Level Management (SL-I|O|A): Enabling teams to define, visualize and alert based on their Service Levels/Error Budgets using Latency, Traffic, Error and Saturation metrics. - Implementation/Maintenance/Automation of ArgoCD and Flux v2 GitOps tools running across 350+ k8s clusters
Senior Development Operations Engineer at NewStore, Inc.
May 2019 - Dec 2021 · 2 yrs 8 mos
- Participant in the regular on call rotation - Managing GitOps tools and repositories (Flux v1 and v2) - Used AWS Services: EC2, ECS, EKS, RDS, Aurora, Lambda, DynamoDB, Kinesis, SNS, SQS, Systems Manager, Secrets Manager - AWS ECS to EKS migration for 150+ services - IaC with terraform - Developing an automation for tenant provisioning: Step Function running Lambdas and containers for automation of deploying/creating/removing AWS infrastructure for tenants - Developing a kubernetes native tenant health-checking tool - Working with 100+ AWS accounts and using SSO, cross account permissions and VPC peering - Supporting RPC over RabbitMQ to gRPC migration of services communication - Implementing Prometheus, Grafana and AlertManager as service for tenants - Implementing service mesh with istio on k8s - Implementing Thanos (for multi tenant long term metrics storage) - Working on efforts for fully automated end to end multi region disaster recovery - Working on SOC 2 compliance (eg. secrets rotation, migration from static tokens/keys to IAM roles, repository scanning for secrets, base image scanning, dependency scanning)
DevOps Engineer at Nurd Innovation Center Turkey
Jul 2018 - May 2019 · 11 mos