Warsaw Metropolitan Area
• Over 10 years of professional IT experience • Expertise in software and data engineering with Scala, Python, and cloud technologies (AWS, GCP) • Extensive DevOps skills: monitoring, IaC, docker, k8s, cloud best practices etc • Strong problem-solving, troubleshooting, organisation and communication skills • Experience in consulting and product environments, working with distributed teams setup
• Enhancing data quality of products • Redesigning pipelines and data models • Communicating with other teams
• Maintained & upgraded K8s clusters • Set up bot traffic monitoring and alarming • Optimized ML deployments in k8s • Enhanced apps monitoring • Set up AWS services monitoring • Maintained & standardized application helm charts • Setup release CI/CD pipelines
• Migrated Spark jobs from EMR to Spark on k8s • Imported to Terraform AWS infrastructure • Optimised Apache Spark data pipeline jobs for cost efficiency • Introduced and integrated Airbyte to the data platform • Stabilised Airflow to run on k8s • Implemented cost optimisation strategies with AWS infra • Collaborated with data science and backend teams to build features on top of data lake
- Building ETL / Data Science low-code solution - Supporting customers and building solutions for their business needs - Team leading core datrics.ai backend development and infra setup
• Enhancing Airflow terraform deployment scripts on AWS with monitoring and alerting mech- anisms, logs collection • Supporting stakeholders in deploying pipelines to Airflow • Jenkins CI/CD jobs for deployments on Airflow • Creating Airflow setup scripts for local environment • Refactoring and productionising Kafka Spark Streaming consumer application on EMR (10x cost reduction), setup monitoring and alerting mechanisms, logs collection • PoC of Hive Metastore setup with Ranger on EMR over data lake on S3
• Building, enhancing, and supporting MicroStrategy & Alation infrastructure on AWS • Building ETL pipelines for processing data in the data lake with Scala, Presto, Apache Spark and AWS Data Pipeline • Performance optimisation and troubleshooting Apache Spark data pipelines • Building generic tools for doing ETL processes on Apache Spark • PoC data science projects (i.e. creating a model to predict a survival of an apartment for rent) • Setting up CI/CD with Jenkins • Setting up monitoring dashboards and alarms with DataDog • Disaster recovery analysing and preparation
• Implementing fuzzy search algorithm for raw and structured data from scratch using Scala and Apache Spark • Creating Ansible and Terraform scripts for various deployments • Apache Spark jobs development, refactoring, optimization and performance tuning • Building ETL frameworks using Scala / Apache Spark stack • Building data pipelines using AWS Data Pipeline and Apache Oozie • Migration between databases (i.e. from Couchbase to PostgreSQL) • Migration from on-premise solutions to the cloud and vice versa (i.e. from Rackspace to AWS) • Building command line Python application for calling of API • Administration of Apache Spark and Apache Cassandra clusters • PoC of IoT projects in the cloud (i.e. creating architecture for retrieving data from devices in a streaming fashion, preaggregating and showing dashboards with data)