Prateek Dubey

Data & AI Engineering, Full Stack Development, Data Management, Data & AI Governance, Data & AI Solution Architecture | Building and deploying Data & AI Platforms and Agentic solutions in Production

Singapore, Singapore

About

Follow me on prateekdubey.com and medium.com/@prateek.dubey Data & AI Engineering Leader with 14 years of experience in architecting and delivering enterprise-scale Data & ML platforms and AI solutions. Proven track record of leading high-performing teams and driving digital transformation initiatives that resulted in significant cost savings and efficiency improvements. Expert in Data & AI Engineering, Cloud-native architectures, ML-Ops, and Data Management & Governance with deep expertise in Spark, Hadoop, Kafka, Airflow, Databricks, Snowflake, AWS, Azure, GCP, CDH, & Kubernetes. Skills & Competencies - Big Data processing frameworks: Hadoop, Spark, Flink, Kafka - Cloud Data Platforms: Databricks, Snowflake, Azure Synapse, Redshift - Data formats: Parquet, Delta Lake, Iceberg, Avro, JSON, YAML - Data Management & Governance: Apache Ranger, Apache Atlas, Amundsen, Datahub, Open Metadata, Great Expectations - Cloud: Amazon Web Services, Azure, Google Cloud Platform - On-Premise: Cloudera CDH, Ceph, Bind9, Metal LB, Rancher Kubernetes (RKE and K3s) - ML-Ops: Kubeflow, KServe, MLflow, Seldon, Feast - Programming & IDE: Python, Pandas, Polars, Unix, Shell scripting, SAS, DBT, SQL Mesh, Jupyter, VSCode, Pycharm - Observability: Grafana, Prometheus, Loki, Thanos, Promtail, Elasticsearch, Kibana, Logstash, SigNoz, Open telemetry - Orchestration: Control-M, Apache Oozie, Apache Airflow, Dagster - Relational, NoSQL, Graph Databases: DB2, Informix, MySQL, Postgres, Teradata, HBase, Trino, Druid, DuckDB, Presto - AI tools: Llama, Qwen, Ollama, Mistral, GPT-4o, Claude code, Chroma DB, Langfuse, Langchain, LangGraph, etc - CI/CD and DevOps: Git, Ansible, Gitlab CI, GitHub Actions, Spinnaker, Argo CD, Tekton - IaaC and Containers: Terraform, CloudFormation, Azure ARM, Docker, Kubernetes, Helm

Experience

  • Director - AI & Data Engineering, Technology Consulting at EY
    Sep 2025 - Present · 10 mos

    - Principal AI & Data Engineer in Technology Consulting division - Working with clients in ASEAN across Gen AI, Agentic AI and Real-time data solutions - Architecting Data & AI solutions for our Public and Private sector clients - Building AI Agents to automate complex workflows, processes and Data & Analytics lifecycle - Building Agentic AI driven Data Engineering solutions - Co-leading our Agentic AI Accelerator Platform - Decision OS to bring Agents to our enterprise customers with end to end Agent Observability, Governance, Guardrails - Co-leading our Agentic Coding Harness and Agent Teams for Data Engineering to automate the entire end-to-end Data Engineering lifecycle

  • Head of Data Engineering & Architecture - Chubb Life at Chubb
    Jun 2025 - Sep 2025 · 4 mos

    - Principal Data Engineer for Chubb Life's Data Engineering Track - Led and mentored a strong team of 50+ Engineers - Worked closely with Global Data leadership to spearhead Chubb Life’s Data Transformation program and initiatives

  • Data Engineering Manager - AI & Data at Temus
    Apr 2024 - Jun 2025 · 1 yr 3 mos

    Team Management & Leadership - Managed, mentored and coached a team of 20+ Data Engineers across multiple specializations. Oversee team's performance, appraisal, 1:1's and professional development - Guided data team members specializing in Data Engineering, MLOps, AIOps, Data & AI Architecture, and Data Governance Technical Architecture & Design - Served as Lead Data Engineer & Data Architect for enterprise-level data solutions across Public and Private sector clients - Designed and implemented scalable, reusable data engineering solutions - Established consistency in design approaches across multiple projects and client engagements - Drove adoption of modern data stack technologies and best practices Client Project Delivery - Worked as a Data & Solution Architect for a public-sector education client. Deployed LLM chatbot solution using Text2SQL on Azure GCC platform for an internal use-case serving 50+ users - Designed and scaled LLM-powered voice-to-voice conversational AI avatars for a private-sector insurance client for sales agent training. Served 100+ users across South East Asia on EKS infrastructure - Worked as a consultant for clients to adopt modern data stack technologies across Azure and AWS cloud. Business Development & Pre-Sales - Lead pre-sales activities including proposal development and client presentations. Successfully closed deals worth 500k SGD. - Engaged in business development initiatives for both public and private sector opportunities - Conducted technical discovery sessions and solution design workshops with prospective clients Innovation & Automation - Co-developed internal solution to automate ETL processes using LLMs with the help of DBT - Co-developed internal solution to power enhanced analytics using Text2SQL for structured data - Researched and prototyped solutions to automate data governance strategies using LLMs with the help of Great Expectations and Open Metadata

  • Global Fashion Group (Singapore)
    • Data Engineering Manager - GSF Data
      Apr 2022 - Apr 2024 · 2 yrs 1 mo

      - Worked with Dafiti (LATAM) on data strategy, restructuring, prioritisation, roadmap planning, resolving customer pain points, and adopting data mesh architecture by treating data as a product for business deliverables - Managed a team of 8 Engineers based out of Vietnam (FTE) and India (Contractual) covering Data Engineering, DevOps, Solution Architecture, Data Governance, Observability and Cloud for GSF Data - Responsible for setting quarterly OKR’s, weekly 1:1s, technical delivery of projects, setting up roadmap for Data Platform, etc. - Participate in deep technical, architectural discussions about design and code review PRs. - Helped Zalora, Iconic, Dafiti migrate their Seller Dashboards to GDP for global outlook. Helped generate thousands of dollars of revenue for the company. - Led the efforts to shutdown Snowflake and Redshift Cloud DWHs to cut cost and migrate to Trino backed Lakehouse. - Ingested GA3 (The Iconic, Dafiti) and Segments (Zalora) events into GDP for behavioural data recommendation. - Improved GDP observability and availability using Loki, Thanos, Grafana, and Prometheus with Slack and Email alerts. - Incorporated Git-Ops methodology, setup CI/CD processes using GitHub Actions, Argo CD, Ansible and Terraform. - Setup Data Governance for our Iceberg Lakehouse backed by Trino using Apache Ranger and Datahub. - Setup Data Quality checks by leveraging Great Expectations. - Setup ML-Ops solution using Mlflow (Experimentation), Feast (Feature Store), KServe (Model Serving), Kong (API Gateway).

    • Lead Data Engineer - GSF Data
      Nov 2021 - Apr 2022 · 6 mos

      - Principal Engineer/ Architect for Global Data Platform using OSS CNCF, Linux & AI, and ASF tech stack on Kubernetes (AWS EKS). GDP is a Central Data Platform for GFG, Zalora, The Iconic and Dafiti to run their workloads including Model Serving. - Technical Leader for GDP (Infrastructure as a Service), DaaS (Data as a Service) projects. Worked closely with regional teams from Zalora (SEA), and The Iconic (ANZ) to deliver data and ML solutions for Pricing Engine, Recommendation, Personalisation.

  • Singtel (Singapore)
    • Data & Solution Engineering Manager - Emerging Technologies
      Nov 2020 - Nov 2021 · 1 yr 1 mo

      - Managed and lead efforts in Data Engineering, DevOps, Solution Architecture, Data Governance, Observability and Cloud. - Built Telco Cyber Data Lake & Platform using Hybrid Cloud for Cyber Analytics and Threat Intelligence. - Lead Architect and Engineer for Telco CDL Kubernetes Platform running On-Premise using Rancher and on AWS using EKS. - Processed TBs of data using Spark on K8s stored in Kafka, Ceph and S3 using our Data and ML Platform on AWS and On-Premise. - Setup requisite logging and monitoring for our system and services using Elasticsearch, FluentD, Grafana, and Prometheus. - Setup up DevOps, CI/CD processes using Gitlab CI, Argo CD, Ansible and Terraform. - Setup Data Governance for Kafka, Ceph, Hive, & S3 using Apache Atlas, Apache Ranger and Amundsen. - Setup ML-Ops solution for Telco CDL by leveraging Mlflow to track Experiments, log metrics & parameters.

    • Lead Data Engineer - Emerging Technologies
      Aug 2020 - Nov 2020 · 4 mos

      - Built Streaming & Batch data pipelines using Apache Kafka and Spark Streaming. Orchestrated data pipelines using Airflow.