Istanbul, Türkiye
As a Data Engineer, I specialize in building scalable data pipelines and optimizing workflows using big data technologies like Apache Hadoop, Apache Spark, and AWS services (Glue, Redshift, S3, ECS). I’m proficient in Python, Scala, Rust, and C++, with extensive experience in big data processing, real-time analytics, OLAP project pipelines, and concurrent programming. My background enables me to design efficient, high-performance systems that process large-scale data in parallel, ensuring scalability and reliability across diverse applications. My expertise extends to machine learning and deep learning, where I develop, optimize, and deploy models using PyTorch and TensorFlow. I manage the full ML lifecycle with MLOps tools like MLflow, deploying models on AWS SageMaker and ECS, and have substantial experience with large language models (LLMs) and live translation solutions. Additionally, I leverage Linux systems, Bash scripting, and functional programming (Haskell) to build efficient, distributed systems. My full-stack development skills with Flask and Django help integrate ML models into cloud-based applications, driving robust, data-driven solutions at scale.
- Building data pipelines on AWS using tools such as AWS Glue, Amazon Athena, Amazon EC2, and Apache Spark. Scala is used to implement certain aspects of the data pipelines, such as custom data transformations or data processing logic in Glue and Spark. - Migrating data from on-premises systems to AWS using the AWS Data Migration Service (DMS). - Developing real-time streaming applications with AWS Kinesis and Apache Flink. Scala is used to implement the data processing logic in these Flink applications. - Building ETL jobs with AWS Glue and integrating applications with S3 buckets and DynamoDB. Scala is used to implement custom data transformations or data processing logic in the Glue ETL jobs. - Managing identity and access with AWS Identity and Access Management (IAM) and building serverless applications with AWS Lambda. - Coordinating workflows with AWS Step Functions and managing infrastructure with Terraform. - Using Amazon Simple Notification Service (SNS) to send notifications and Amazon Simple Queue Service (SQS) to decouple and scale microservices, distributed systems, and serverless applications. - Utilizing Amazon CloudWatch to monitor and log AWS resources and applications. - Integrating applications with Amazon Relational Database Service (RDS) for managed relational database hosting.
- Managed and maintained a Cloudera Management System, which is a platform for deploying and managing Apache Hadoop-based big data clusters. This involved configuring the system, troubleshooting issues, and monitoring the clusters. - Administered Apache Hadoop-powered big data clusters, which involved setting up and configuring the clusters, tuning the Hadoop environment, and monitoring and troubleshooting the clusters. - Demonstrated proficiency in Bash scripting, which is a common shell language used for automating tasks and writing system-level programs. - Utilized Kubeflow, an open-source platform for running and managing machine learning (ML) workflows, in conjunction with Bash scripting to automate and manage ML operations. - Built ETL pipelines using Apache Sqoop and Apache Oozie, which involved extracting data from various sources, transforming the data, and loading it into a destination system. - Worked on data science projects at all stages using Apache Spark, a powerful big data processing engine. This involved tasks such as data preparation, feature engineering, model training and evaluation, and deployment. - Used machine learning techniques to identify potential cross sell opportunities, which are sales of additional products or services to existing customers. - Used anomaly detection models to identify potential fraudulent activity in health insurance policies. - Managed and maintained Linux-based systems, including installing and configuring software and performing system-level tasks such as managing users, groups, and permissions. - Used SQL-based engines such as Hive and Impala to perform data querying, analysis, and manipulation tasks on data stored in a Hadoop cluster or other distributed storage system. - Provided training or educational materials on big data and Python programming to colleagues within the company.
- Apache Hadoop Administration - Hive and Impala QL - Cloudera Management System Administration - ETL Operations - Machine Learning for Health Operations - Bash Scripting - Apache Spark - MLlib
- Worked on multi-label classification projects related to architectural data for my graduation project and thesis. - Developed new solutions for multi-label classification problems. - Implemented and experimented with different deep learning models such as Dynamic Graph CNN, Graph Convolutional Networks, and traditional architectures using Pytorch. - Applied multi-label classification techniques to classify architecture designs. - Developed a Flask backend to host and serve the developed models.
- Implemented computer vision tasks, including the development of deep learning models. - Conducted research on scientific articles to inform the design and implementation of the deep learning models. - Proposed AI solutions for retail problems, with a focus on age and gender detection in stores. - Deployed and trained the developed models in AWS and Google Cloud. - Used OpenCV and Pytorch as development environments for the computer vision tasks.
As an assistant, I worked on network and system solutions mostly. I provided solutions to people in the workplace. I also completed network and system courses in the workplace. Cisco CCNA 1 / 2, Windows Server 2016 courses are completed.