Cristobal Rodriguez

Data Engineer | Data Architect

Spain

About

Computer science engineer with knowledge in Machine Learning, AI, Data Mining & Big Data. Highly skilled in programming, problem-solving and leadership.

Experience

  • Bluetab, an IBM Company (Full-time · 3 yrs 9 mos)
    • Data Architect | Data Engineer
      Jul 2025 - Present · 1 yr

      Client: Banco Santander - Data Architecture and Engineering: • Designed, defined, and built the project’s initial data architecture, establishing technical standards and structural foundations for data pipelines developed with PySpark. • Created and maintained modular and reproducible PIP environments to ensure consistent deployments across development, integration, and production environments. • Developed advanced SQL queries on the bank’s Data Lake to enable data ingestion, transformation, and modeling processes. • Implemented optimized Delta tables to enhance efficiency, governance, and scalability of data processing workflows. - CI/CD and Automation: • Designed and implemented the automated deployment pipeline using Jenkins, enabling continuous integration and continuous delivery into Databricks. • Established best practices for versioning, validation, dependency management, and automated testing to ensure reliability and performance of data workflows. - Machine Learning Enablement: • Supported the development of a machine learning solution using clustering algorithms and decision trees to help the bank’s commercial team identify and target potential future customers. • Ensured that data pipelines, feature engineering processes, and model-ready datasets were scalable, reliable, and optimized for downstream ML workloads. - Leadership and Collaboration: • Acted as one of the most experienced Data Engineers on the team, guiding architectural decisions and mentoring junior engineers. • Collaborated closely with data scientists, business stakeholders, and platform teams to align data architecture with strategic business objectives. • Drove best practices around data quality, performance, and maintainability across the entire engineering team.

    • Data Engineer
      Apr 2025 - Jul 2025 · 4 mos

      Client: Moeve – Data Pipeline Modernization: • Led the migration of critical data pipelines from SAP legacy systems to SAP HANA, improving data latency, structure, and analytical readiness. • Designed and implemented automated ETL workflows using AWS Lambda, Step Functions, and Glue to deliver scalable, cloud-native ingestion. – Cloud Data Engineering: • Built and maintained cloud data workflows with CloudFormation, ECR, EC2, Batch, and DynamoDB, ensuring consistency and reliability across environments. • Optimized storage and analytics using Apache Iceberg and Parquet, enabling high-performance querying and schema evolution. – Analytics Enablement: • Delivered refined datasets and automated flows to support Power BI reporting, enabling faster and more accurate insights for business stakeholders. – Operational Excellence: • Improved data quality, performance, and maintainability through standardized transformations, validation processes, and infrastructure-as-code practices.

    • Data Engineer
      Jan 2024 - Apr 2025 · 1 yr 4 mos

      Client: Naturgy – ETL Pipeline Design and Automation: • Designed and automated ETL workflows using AWS Lambda, Step Functions, and Glue to process large-scale customer and billing datasets. • Integrated S3 as the central data lake, streamlining ingestion and improving data availability for analytics teams. – Data Processing and Optimization: • Optimized data storage and retrieval using Hudi, Parquet, and Delta formats to support scalable and efficient processing. • Enhanced SQL transformations and dataset structures to ensure high data quality and performance for large-volume energy data. – Analytics and Reporting Enablement: • Improved business intelligence capabilities by optimizing data models and query performance for AWS Quicksight, delivering faster and more accurate insights for decision-makers. – Infrastructure and Automation: • Developed and maintained Terraform templates to enable reproducible, automated, and scalable AWS infrastructure deployments.

  • Pentaquark Consulting (3 yrs 9 mos)
    • Data Architect | Backend Lead
      Jul 2021 - Oct 2022 · 1 yr 4 mos

      CLIENT - Cryptocurrency trading platform ACHIEVEMENTS - Led a team of 6 engineers in the development of a cryptocurrency trading platform, overseeing the creation of a robust backend architecture and the overall project structure. - Designed and implemented the AWS architecture for the project, utilizing EC2, Lambda, Step Functions, S3, and Amplify for scalable and serverless infrastructure. - Managed the backend development using Spring Boot and Java, ensuring efficient integration with the frontend and secure transaction processing. - Implemented data processing pipelines using PySpark to handle large-scale cryptocurrency data and improve transaction speed and reliability. - Introduced CI/CD practices with Jenkins and SonarQube, ensuring code quality and consistent deployment across the system. - Architected a secure wallet system to manage individual cryptocurrency portfolios, ensuring data security and user privacy. - Worked closely with cross-functional teams to define technical requirements and ensure the system met business and user needs.

    • Machine Learning Engineer | AI Developer
      Jan 2021 - Jul 2021 · 7 mos

      CLIENT - Private Hospital ACHIEVEMENTS - Developed a Convolutional Neural Network (CNN) in Python to predict whether a patient had COVID-19 based on chest X-ray images, identifying signs such as pneumonia. - Conducted extensive research to create a dataset of healthy lung images for training the algorithm, enabling accurate differentiation between healthy lungs and potential COVID cases. - Leveraged AWS services, including S3 for image storage and EC2 for model training, implementing AWS Lambda functions for efficient data processing. - Used scikit-learn, Seaborn, Pandas, NumPy, and Keras to build, train, and evaluate the machine learning model, ensuring high accuracy in predictions. - Implemented batch processing to handle large volumes of medical images, optimizing the workflow for scalability and performance.

    • Backend Developer | Scrum Master
      Sep 2019 - Mar 2021 · 1 yr 7 mos

      CLIENT - Major international financial bank ACHIEVEMENTS - Developed and implemented a real-time messaging system for sending and receiving Avro-formatted messages, enabling communication between different projects. - Built and maintained the backend for a data visualization team, ensuring efficient data flow using Kafka, Flink, and Hazelcast. - Used Hadoop for processing large-scale data and ensuring reliable data distribution across systems. Integrated Spring Boot for building scalable backend services and APIs to support various internal data projects. - Worked with Jira, Git, Jenkins, and SonarQube to ensure high-quality, maintainable code and efficient CI/CD workflows. - Acted as Scrum Master, overseeing task estimation and management, while tracking team progress and ensuring the project met its milestones. - Communicated with clients to gather and translate requirements, ensuring the team delivered relevant metrics and features aligned with business goals.

  • Data Engineer | Digital Marketing Data Analyst at Accenture
    Sep 2018 - Feb 2019 · 6 mos

    MARKETING DATA ENGINEERING - Sole technical member in a Digital Marketing team supporting a major hotel chain in Brussels. - Developed Python and Java scripts to automate and optimize reporting workflows. - Extracted and processed marketing performance data from Adobe Analytics. - Designed and migrated reports from Excel to Google Data Studio and Power BI for better visualization and insights. ACHIEVEMENTS - Automated data extraction and reporting processes, reducing manual effort and improving efficiency. - Researched and implemented dashboard solutions in Google Data Studio and Power BI, enhancing data accessibility. - Improved data processing workflows, ensuring more accurate and timely reports for the marketing team. - Facilitated data-driven decision-making by integrating real-time analytics into reporting pipelines.

  • Junior Data Engineer | ETL Developer | Database Engineer at Audi Konfuzius-Institut Ingolstadt Microlab
    Jan 2018 - Jul 2018 · 7 mos

    RESEARCH & DATA ENGINEERING - Developed data pipelines for human motion analysis using AI techniques (Research Inverse Kinematic Learning & Recurrent Neural Networks). - Designed and maintained a PostgreSQL database to store and process motion data. - Collected and preprocessed movement data from a Kinect camera, storing structured motion data in JSON format. ACHIEVEMENTS - Built an automated ETL pipeline in Python to clean, transform, and store movement data. - Optimized PostgreSQL queries to enhance data retrieval and storage efficiency. - Ensured data integrity and consistency by implementing validation scripts and automated checks. - Developed custom indexing and partitioning strategies to improve query performance.

  • Data Governance Analyst Intern at Universidad de Jaén
    Mar 2017 - Jul 2017 · 5 mos

    • Correct inconsistencies on official documents about final thesis. (Microsoft Office) • Analyze data for data quality and validity issues. (MySQL) • Log data issues and user requests accurately. (Java) • Provide user support for data inconsistency, issues and problems. (Verbal and written communication skills)