Ertuğrul Garipardıç

Site Reliability Engineer (SRE) | DevOps | Kubernetes | Linux | Production Operations

Istanbul, Türkiye

About

Site Reliability Engineer (SRE) with hands-on experience in production operations, Kubernetes, Linux administration, observability platforms, and incident management. Experienced in monitoring production environments, troubleshooting complex infrastructure issues, and supporting high-availability systems. Skilled in automation using Python and Bash, with a strong focus on system reliability, operational excellence, and continuous improvement. Technical Skills Monitoring: Grafana, Zabbix, Prometheus, Graylog, ELK Stack Databases: Oracle, PostgreSQL, MongoDB, Cassandra, MySQL Operating Systems: Linux (Ubuntu/CentOS) & Windows Server Cloud Platforms: AWS (EC2, IAM, CloudWatch), Azure Fundamentals Containers & Orchestration: Docker, Kubernetes, OpenShift CI/CD & Automation: Python, Bash / Shell Scripting, Jenkins Infrastructure as Code (IaC): Basic Terraform, Basic Ansible Core Competencies: Incident Management, Root Cause Analysis (RCA), ITIL Processes

Experience

  • Site Reliability Engineer (SRE) at Bilyoner
    Sep 2024 - Present · 1 yr 10 mos

    Monitoring & Alert Management: Monitored 100+ production services using Grafana, Prometheus, Zabbix, Graylog, and ELK Stack. Built and maintained dashboards and optimized alert thresholds to improve detection efficiency and reduce alert noise. Ticket Management: Handled daily operational support via internal ticketing systems, resolving ~15–20 production tickets per day while meeting SLA requirements. Incident & ITIL Operations: Resolved L2/L3 production incidents in 24/7 environments within SLA/SLO targets. Participated in Incident, Problem, and Change Management processes and contributed to RCA efforts to improve system reliability and reduce recurring issues. Database Operations: Supported Oracle, PostgreSQL, MongoDB, and Cassandra environments, handling troubleshooting related to connectivity, performance, and availability. Collaboration & Production Support: Worked closely with Development, QA, DevOps, and Infrastructure teams to improve reliability, deployment stability, and operational efficiency. Automation & Scripting: Automated routine tasks and health checks using Bash, Python, and CronJobs, reducing manual effort and improving operational consistency. Supported CI/CD workflows with DevOps teams. Kubernetes & Cloud Operations: Supported Kubernetes and OpenShift workloads, including pod troubleshooting, log analysis, and service health monitoring. Assisted in AWS production environments to ensure system availability. Linux & Infrastructure Operations: Managed Linux servers and virtual machines via SSH, performing service management, log analysis, process monitoring, and troubleshooting to ensure stability. Domain & Release Operations: Performed post-release validations and health checks to detect and mitigate issues. Supported multiple business domains, including finance and customer-facing services, during deployments and incidents.

  • Technical Customer Success Specialist at Digybite
    Jun 2023 - Jul 2024 · 1 yr 2 mos

    Technical Support: L1 technical and operational support was provided to B2B customers using SaaS-based e-commerce platforms. Customer-reported incidents and platform-related issues were analyzed and resolved to maintain operational continuity. Customer onboarding and operational readiness processes were supported through platform configuration and deployment coordination activities. Collaboration & Agile Operations: Issue tracking and operational coordination activities were maintained through Jira in collaboration with development and infrastructure teams. Agile Scrum processes including sprint planning and release cycles were actively followed. Deployment & CI/CD Operations: Application deployment and release activities were supported through Jenkins-based CI/CD pipelines. Deployment validations, release coordination, and operational checks were performed during production rollouts. Mobile Application Operations: Pre-release configuration and operational setup activities were performed for iOS and Android mobile applications prior to production deployments. UI/UX & Figma Operations: Mobile application interface designs and customer-specific UI configurations were prepared using Figma.