Barcelona, Catalonia, Spain
Data Engineer, working in data analytics & machine learning projects in a wide area of industries, using a variety of technologies for data collection, storage, processing & analysis. Specifically interested in the multidisciplinary impact of innovative technologies and their contribution to the evolution of society
Collection, integration & transformation of user event data into Didomi's data lake & data warehouse using technologies & frameworks as: - AWS Lambdas, Kinesis Data Stream & Firehose for ingestion - Pyspark data pipelines - Airflow for job orchestration - DBT transformations & models - Snowflake DWH - Other AWS services such as ECS (containerized workloads) - IaC with Terraform - CI/CD with gitlab-ci
Data Engineer, working in data analytics & machine learning projects in a variety of industries and clients such as bol.com, Enexis, Randstad and many more. In a typical project we have freedom of technology & tooling selection within the boundaries defined by the client, for example a specific cloud environment. Main areas of responsibilities & expertise across several projects: - Setting up CI/CD pipelines and git-based development workflows for the team - Data ingestion for batch & streaming data using tools & services such as Apache NiFi, Apache Airflow, Apache Kafka & similar cloud services (AWS Kinesis, Google Pub/Sub, Azure Data Factory) - Data Storage using relational & non relational DBs, data lakes & data warehouses. (MySQL/Postgres/RDS/Cloud SQL, HDFS/S3, Hive, Redshift/Big Query, Redis) - Data pipeline development & orchestration using Apache Airflow or simpler cloud services (Azure functions, AWS Lambdas) - Data analytics applications development using libraries & frameworks such as Python Pandas & Apache Spark - API development using frameworks such as Python Flask & FastAPI or Java Spring - Application deployment with Docker & Kubernetes or other container based cloud services (Google GKE, AWS ECS, AWS/Azure Batch) - Infrastructure deployment with IaaC frameworks such as Terraform & AWS Cloudformation - Machine Learning models for predictive maintenace, forecasting & classification using frameworks such as Python Scikit-learn - MLops platform for model experimentation, versioning & results tracking using AWS Sagemaker
Working within the ProRail DataLab team: - Data Ingestion of a variety of sources using Apache NiFi - Data Storage in HDFS & Hive - Data processing & Analytics using Hive & Apache Spark infrastructure & processing frameworks - Predictive Maintenance ML models using Scikit-learn
Member of the team responsible to implement Data Governance components within the organization: • Communication with cross-functional stakeholder groups • Identify & cooperate with Data Owners • Design AS-IS data flows & architecture • Initiate & participate in actions for improvement Member of a Proof of Concept team with the goal to identify use cases and familiarize with technologies of the Big Data/Hadoop toolset (HDFS, Hive, Ranger etc.)
Design of a Data Governance framework and implementation plan for organizations of the financial services industry