Ali Alper Ak

Principal DevOps Engineer | Kubestronaut | CKS | CKA | CKAD | AWS CSAA

Berlin Metropolitan Area

About

I am a Principal DevOps / Site Reliability Engineer with AWS and Google Cloud hands-on experience. I am certified in AWS and Kubernetes (CKA - CKS). I always work with security first mindset. I worked in PCI DSS and SOC2 Type II certified environments (also during certification progress). I am specially interested in observability and monitoring and I love to automate everything.

Experience

  • Entrust (Berlin, Germany)
    • Principal Cloud DevOps Engineer
      Sep 2025 - Present · 10 mos

      As Principal Cloud DevOps Engineer and Platform Tech Lead at Entrust my primary responsibilities are: -Architect and champion the platform vision, shaping scalable, secure, and reliable cloud infrastructure across enterprise environments. -Lead the development and execution of DevOps strategies, harnessing automation, CI/CD pipelines, and infrastructure-as-code to drive accelerated delivery and operational excellence. -Collaborate with cross-functional teams to deliver robust cloud-native solutions aligned with strategic business goals, fostering innovation and best practices in architecture, security, and reliability.

    • Staff Cloud DevOps Engineer
      Aug 2024 - Sep 2025 · 1 yr 2 mos

      Following Entrust’s acquisition of Onfido, I transitioned into the role of Staff Cloud DevOps Engineer at Entrust, where I also serve as Platform Tech Lead. - Lead platform architecture and DevOps strategy to ensure scalability, security, and reliability across cloud environments. - Drive automation, CI/CD pipelines, and infrastructure-as-code practices to accelerate delivery and improve operational efficiency. - Collaborate with cross-functional teams to design and implement cloud-native solutions aligned with business objectives.

  • Onfido (Berlin, Germany)
    • Staff DevOps Engineer
      Mar 2024 - Aug 2024 · 6 mos

    • Senior DevOps Engineer
      Apr 2023 - Mar 2024 · 1 yr

      - Participant in the regular on call rotation - Designing and implementing various parts of platform repave initiative - Upgrading very old Postgres DBs to latest version and splitting shared DBs which both enabled yearly savings upto 500k and increased reliability of services - Getting rid of lots of legacy infrastructure parts/tools and moving to latest opensource technologies - Implementing GitOps with Flux and migrating all infra tools to GitOps in all EKS clusters - Also I have been granted Onfido Value award for "find a better way" as recognition of all my work on infrastructure.

  • Senior Site Reliability Engineer at Trendyol Group
    Dec 2021 - Apr 2023 · 1 yr 5 mos

    - Member of Core SRE Team - Participant in the regular on call rotation - Migration of Dolap (Trendyol's c2c - secondhand marketplace) from AWS ECS to EKS. End to end IaC/terraform orchestrated and implemented GitOps via Flux v2. - AWS Cost optimization - Cost and Capacity Management: Creating a product for calculating on-premise data center costs and capacity management (running across 7 DCs with ~7K physical hosts and ~30K VMs) for tribes, teams and workloads. - Service Level Management (SL-I|O|A): Enabling teams to define, visualize and alert based on their Service Levels/Error Budgets using Latency, Traffic, Error and Saturation metrics. - Implementation/Maintenance/Automation of ArgoCD and Flux v2 GitOps tools running across 350+ k8s clusters

  • Senior Development Operations Engineer at NewStore, Inc.
    May 2019 - Dec 2021 · 2 yrs 8 mos

    - Participant in the regular on call rotation - Managing GitOps tools and repositories (Flux v1 and v2) - Used AWS Services: EC2, ECS, EKS, RDS, Aurora, Lambda, DynamoDB, Kinesis, SNS, SQS, Systems Manager, Secrets Manager - AWS ECS to EKS migration for 150+ services - IaC with terraform - Developing an automation for tenant provisioning: Step Function running Lambdas and containers for automation of deploying/creating/removing AWS infrastructure for tenants - Developing a kubernetes native tenant health-checking tool - Working with 100+ AWS accounts and using SSO, cross account permissions and VPC peering - Supporting RPC over RabbitMQ to gRPC migration of services communication - Implementing Prometheus, Grafana and AlertManager as service for tenants - Implementing service mesh with istio on k8s - Implementing Thanos (for multi tenant long term metrics storage) - Working on efforts for fully automated end to end multi region disaster recovery - Working on SOC 2 compliance (eg. secrets rotation, migration from static tokens/keys to IAM roles, repository scanning for secrets, base image scanning, dependency scanning)

  • DevOps Engineer at Nurd Innovation Center Turkey
    Jul 2018 - May 2019 · 11 mos