Samuel Chinta

Data Engineer | Databricks • Snowflake • AWS •Microsoft Fabric • PySpark • Delta Lake • Unity Catalog • AI/GenAI • Data Governance

United States

About

I build governed, cloud-native data platforms that transform terabytes of complex enterprise data into trusted products for analytics, AI/ML, and GenAI without sacrificing governance, security, or compliance. Over 10+ years across retail, manufacturing, healthcare, and SaaS, I've led data platform and engineering initiatives at Starbucks, Milwaukee Tool, Siemens Healthineers, Atlassian, Honeywell, and PepsiCo. What I've shipped recently: • Rolled out an enterprise governance framework at Starbucks using Databricks, Unity Catalog, Privacera, and Collibra, reducing audit cycle times by nearly 50% through automated lineage and centralized metadata management. • Led the Dynamics AX → Oracle Fusion ERP migration at Milwaukee Tool using Data Vault 2.0, delivering a zero-reconciliation-defect cutover. • Built Medallion Lakehouse architectures (Bronze/Silver/Gold) on Delta Lake across multiple enterprises, standardizing multi-terabyte pipelines with ACID guarantees, schema evolution, and scalable ingestion patterns. • Delivered RAG-based semantic search solutions using Azure OpenAI and Cognitive Search, enabling analysts to discover governed datasets through natural language. • Optimized Spark and Snowflake workloads through partitioning, Z-Ordering, adaptive query execution, and caching strategies, reducing pipeline runtimes by ~30% and lowering compute costs. • Built and modernized cloud-native data platforms across Azure and AWS using serverless and distributed architectures. Core Technologies: Databricks • Unity Catalog • Snowflake • Microsoft Fabric (OneLake, Dataflows Gen2) • AWS (Glue, S3, Lambda, Step Functions, EMR, Redshift, Kinesis) • Azure Data Factory • Azure Synapse • Delta Lake • PySpark • Spark SQL • Airflow • Kafka • Event Hubs • dbt • Data Vault 2.0 • Medallion Architecture • Privacera • Collibra • Alation • Azure OpenAI • RAG • Copilot Studio • Terraform • GitHub Actions • Azure DevOps Open to Senior Data Engineer, Staff Data Engineer, and Data Platform Engineer opportunities (Remote, Hybrid, or Onsite). email: [email protected]

Experience

  • Data Platform Engineer at Starbucks
    Aug 2025 - Present · 11 mos

    Leading enterprise data platform and governance rollout across Databricks, Unity Catalog, Privacera, and Collibra - serving marketing, store operations, and digital analytics teams across Starbucks' cloud data ecosystem. Key impact: - Rolled out an enterprise data governance operating model (metadata standards, lineage, stewardship workflows, automated DQ enforcement) - cut compliance and audit cycle times ~50%. - Architected Microsoft Fabric Lakehouse (OneLake, Dataflows Gen2) unifying Databricks, Snowflake, and Azure sources for Power BI consumers across marketing and store ops. - Shipped RAG-powered semantic data discovery on Azure OpenAI + Cognitive Search, letting analysts find governed datasets through natural language instead of tribal knowledge. - Built high-volume ETL/ELT pipelines in Databricks (PySpark, Spark SQL, Delta Lake) processing millions of records daily with embedded validation, reconciliation, and monitoring. - Implemented fine-grained security (RBAC, row-level filtering, column-level masking) using Unity Catalog + Privacera for sensitive customer and operational datasets. - Orchestrated end-to-end workflows with Airflow DAGs across Databricks, Snowflake, and Azure services — dependency management, dynamic task generation, retries, and SLA alerts. - Operationalized ML models (batch + real-time) using Databricks ML and Spark Structured Streaming, with Unity Catalog providing model governance and lineage. - Tuned Databricks workloads via partitioning, Z-ordering, and cluster optimization - reduced compute spend while holding SLA adherence. - Integrated LLM-driven workflows (Azure OpenAI, Copilot Studio) to automate metadata enrichment, documentation, and intelligent data discovery. - Defined SLAs, runbooks, and incident processes that reduced MTTR on critical pipelines; mentored engineers on Databricks and Unity Catalog governance patterns.

  • Senior Data Engineer at Milwaukee Tool
    Jul 2024 - Jul 2025 · 1 yr 1 mo

    Designed enterprise-scale data architectures to modernize ERP analytics and reporting. Built metadata-driven pipelines with orchestration frameworks, YAML, and SQL metadata tables, enabling faster onboarding of 50+ datasets with reusable components. Applied Medallion Architecture & Data Vault 2.0 on Delta Lake, ensuring governed, auditable, and high-quality datasets. Migrated ERP data from multiple transactional systems into modern ERP platforms and data warehouses with automated validation and reconciliation. Automated CI/CD deployments with version-controlled infrastructure and pipelines, reducing release time and improving reliability. Delivered dashboards & machine learning pipelines for real-time ERP insights, anomaly detection, and operational forecasting.

  • Senior Data Engineer at Siemens Healthineers
    Apr 2023 - Jun 2024 · 1 yr 3 mos

    Designed scalable enterprise data architectures enabling ERP modernization and cross-functional analytics. Built metadata-driven ETL pipelines with orchestration frameworks and parameterized configurations, accelerating onboarding of 50+ datasets. Applied Medallion Architecture & Data Vault 2.0 on Delta Lake for standardized, auditable datasets. Developed ML pipelines with PySpark and PyTorch for anomaly detection, forecasting, and real-time inference. Automated CI/CD pipelines ensuring secure, version-controlled deployments across multiple data environments. Delivered real-time dashboards providing leadership with supply chain and ERP insights.

  • Data Engineer at Atlassian
    Jan 2022 - Mar 2023 · 1 yr 3 mos

    Led migration of legacy ETL workloads from on-prem databases into modern cloud-native data platforms while integrating with hybrid analytics environments. Built secure, large-scale pipelines using orchestration frameworks, metadata-driven jobs, and centralized data lakes. Developed PySpark and Spark SQL jobs for ingestion, cleansing, and transformation across distributed data platforms. Enabled fraud detection and anomaly monitoring with ML pipelines using PyTorch and Spark Structured Streaming, delivering real-time inference. Automated CI/CD pipelines for consistent multi-environment deployments. Delivered dashboards and BI solutions backed by curated datasets, providing KPI-driven insights.

  • Data Engineer at Honeywell
    Apr 2016 - Mar 2021 · 5 yrs

    Built scalable ETL pipelines integrating data from multiple relational and enterprise systems into centralized data warehouses. Migrated legacy workloads into modern warehouse solutions with automated validation and zero-downtime cutovers. Developed PySpark and Scala jobs for large-scale transformation of multi-terabyte datasets in Delta Lake. Enabled real-time streaming analytics by integrating event-driven ingestion pipelines with Kafka and REST APIs. Delivered BI dashboards for financial and operational KPIs, supporting enterprise-wide decision-making. Automated CI/CD deployments for secure, version-controlled releases.