Anurag Srivastava

Senior Data Engineer | Databricks Professional Certified | PySpark • Snowflake • ADF • ETL | Founder – DataX Bootcamp | Mentored 3200+ Learners | 260K+ YouTube • 240K+ IG

Bengaluru, Karnataka, India

About

🚀 Certified Databricks Professional Data Engineer | 5+ Yrs Experience | Databricks • PySpark • Snowflake • GCP • Azure • AWS | 262K+ on YouTube | 200K+ on Instagram I’m Anurag Srivastava, a self-taught Data Engineer with 5+ years of experience delivering end-to-end data engineering solutions — from full-scale data migrations (975+ PB) to real-time pipeline implementations across multi-cloud environments. ✅ Databricks Certified Professional Data Engineer ✅ Hands-on with Databricks, PySpark, Snowflake, SQL, Python ✅ Worked across GCP, Azure, and AWS ✅ Delivered solutions for clients across diverse domains ✅ Taken 60+ Data Engineering interviews and guided technical teams ✅ Trained 3200+ professionals to start or switch into the Data Domain 🎓 I’m also the Founder of DataX Bootcamp – a 90-day intensive program designed to make learners job-ready Data Engineers, covering Databricks, PySpark, SQL, and real-world projects. 👉 Join here: https://login.datawithanurag.com/services/dataxbootcamp 📺 YouTube: https://youtube.com/c/beaprogrammer (262K+ Subscribers) 📸 Instagram: https://instagram.com/data_with_anurag (200K+ Followers) I share practical insights on SQL, PySpark, Cloud (Azure/GCP), Interview Prep, and Career Growth — all built from real industry experience. 💡 Let’s connect if you: • Are preparing for Data Engineering or Analytics roles • Need help with projects, resumes, or interviews • Want to collaborate, learn, or invite me as a guest speaker 📬 Email: [email protected]

Experience

Koantek (Remote)
- Senior Data Engineer
  Jul 2025 - Present · 1 yr
  • Built a multi-agent chatbot in Azure Databricks using LangChain + LangGraph • Designed a Supervisor–Worker agent flow with Genie for business data • Chatbot delivers summaries, trends, and insights from structured datasets • Deployed as an MLflow Model Serving endpoint for secure, scalable use
- Data Engineer
  Feb 2024 - Jul 2025 · 1 yr 6 mos
  Successfully migrated 15+ complex data pipelines across Snowflake, Azure Databricks, and AWS Databricks ecosystems — including ML model portability via MLflow. 1- Spearheaded Snowflake to Databricks and Azure to AWS Databricks migrations, including end-to-end data pipeline development, validation, and automation. 2- Developed Python- and SQL-based workflows for seamless data migration across multiple clients, ensuring accuracy and high performance. 3- Automated data pipeline scripting and validation processes using PySpark, SQL, and Python, optimizing workflows for large-scale data transitions. 4- Built regex-based automation scripts to transpile Snowflake SQL queries to Databricks-compatible SQL, reducing manual effort and errors. 5- Worked extensively on Databricks Medallion Architecture, Delta Lake best practices, and performance tuning of Spark jobs. 6- Engineered data validation frameworks between Azure and AWS Databricks platforms, ensuring 100% accuracy during critical migration projects. 7- Executed end-to-end MLflow model and experiment migrations between Azure DBX and AWS DBX environments. 8- Led weekly production deployments for data pipeline automation using GCP tools like BigQuery, Dataproc, and Dataflow (Charles Schwab Project). 9- Collaborated closely with data scientists and business stakeholders to ensure migration success, while maintaining governance and compliance standards. 10- Delivered scalable and maintainable ETL frameworks for Retail, Gaming, and AI Assistant industries, driving data readiness for analytics and ML.
Content Creator at YouTube
May 2020 - Present · 6 yrs 2 mos
Mphasis (Full-time · 3 yrs 1 mo)
- Data Engineer
  Apr 2023 - Feb 2024 · 11 mos
  1- I worked for about 3 Years in CDW Project where i had an experience working with 2 Different‬ Teams (DDL Team and Data Migration Team).‬ 2- Part of Team where we designed Data Migration Utility that is an ETL Tool to Migrate data from‬ Teradata to GCP, that gave End-to-End solution for loading the data from teradata to GCP by‬ extracting data using TDCH/SQOOP from teradata then transforming the data using PySpark and‬ loading it to GCP BigQuery.‬ Technologies Used: Python,Teradata, GCP,Unix, Pyspark‬ 3- Worked with cross functional teams to ensure smooth loading of data using ETLs and helped ETL‬ teams in case of data discrepancy of the data.‬ 4- Worked on Conversion of‬‭ 40,000 Objects‬‭ (Tables,Views and Procedures) and worker with cross‬ functional teams to get DDL compatible as per data accepted in GCP.‬ 5- Worked for deployment of objects in BigQuery Environements through liquibase Bamboo Build.
- Associate Software Engineer
  Apr 2022 - Apr 2023 · 1 yr 1 mo
- Trainee Associate Software Engineer
  Feb 2021 - Apr 2022 · 1 yr 3 mos
  -Part Of Core Data Engineering Team. -Working on Teradata, Google Cloud platform(GCP) and BigQuery.