Deploy and maintain secure cloud infrastructure primarily on GCP (Google Cloud Platform) and AWS, ensuring seamless integration between services.
Manage and optimize GKE (Google Kubernetes Engine) clusters for high-availability AI applications and microservices.
Infrastructure as Code: Build and enforce Terraform strategies to provision and manage infrastructure, ensuring environments are reproducible and version-controlled.
CI/CD & Software Verification: Design and implement advanced CI/CD workflows using GitHub Actions, moving beyond simple deployments to create intelligent automation.
Build robust verification pipelines that include automated testing, linting, security scanning, and quality gates before production release.
Streamline the release process for backend and frontend applications, ensuring "one-click" reliability.
Oversee the deployment, maintenance, and backup strategies for databases.
Implement comprehensive monitoring and logging solutions (Prometheus, Grafana, Cloud Ops, Cloud Monitoring) to ensure system health, performance, and rapid incident response.
Implement security best practices (IAM, VPC configuration, encryption) to protect sensitive AI data and intellectual property.
Qualifications
Education & Experience
B.Sc. or M.Sc. in Computer Science, Computer Engineering, or a related technical field.
5+ years of relevant experience in DevOps, Cloud Engineering, or Site Reliability Engineering (SRE).
Proven experience acting as a Senior or Lead engineer, guiding architectural decisions.
Technical Requirements
Advanced hands-on experience with GCP (specifically GKE, VPC, IAM) and a working knowledge of AWS.
Proven ability to design and implement robust, scalable, and secure cloud architectures.
Mastery of Docker and Kubernetes administration.
Strong proficiency in Terraform.
Expert knowledge of GitHub Actions for building, test, build, and deploy pipelines.
Experience managing PostgreSQL and ClickHouse databases.
Deep understanding of Linux System Administration.
Experience with AI/ML lifecycle tools (e.g., Kubeflow, MLflow, Vertex AI).
Strong proficiency in Python and Bash scripting.
Familiarity with DevSecOps tools and practices.
Bonus: Official GCP Certifications (e.g., Professional Cloud Architect, Professional Cloud DevOps Engineer) and Kubernetes Certifications (e.g., CKA, CKAD, CKS) are highly preferred.
Bonus: Knowledge of Javascript/Node.js is a strong plus.
Soft Skills & Mindset
Ability to design long-term solutions rather than quick fixes.
Excellent ability to explain complex cloud concepts to Data Scientists and business stakeholders.
Comfortable working in a fast-paced environment with evolving requirements.