Bengaluru, Karnataka, India
🚀 Certified Databricks Professional Data Engineer | 5+ Yrs Experience | Databricks • PySpark • Snowflake • GCP • Azure • AWS | 262K+ on YouTube | 200K+ on Instagram I’m Anurag Srivastava, a self-taught Data Engineer with 5+ years of experience delivering end-to-end data engineering solutions — from full-scale data migrations (975+ PB) to real-time pipeline implementations across multi-cloud environments. ✅ Databricks Certified Professional Data Engineer ✅ Hands-on with Databricks, PySpark, Snowflake, SQL, Python ✅ Worked across GCP, Azure, and AWS ✅ Delivered solutions for clients across diverse domains ✅ Taken 60+ Data Engineering interviews and guided technical teams ✅ Trained 3200+ professionals to start or switch into the Data Domain 🎓 I’m also the Founder of DataX Bootcamp – a 90-day intensive program designed to make learners job-ready Data Engineers, covering Databricks, PySpark, SQL, and real-world projects. 👉 Join here: https://login.datawithanurag.com/services/dataxbootcamp 📺 YouTube: https://youtube.com/c/beaprogrammer (262K+ Subscribers) 📸 Instagram: https://instagram.com/data_with_anurag (200K+ Followers) I share practical insights on SQL, PySpark, Cloud (Azure/GCP), Interview Prep, and Career Growth — all built from real industry experience. 💡 Let’s connect if you: • Are preparing for Data Engineering or Analytics roles • Need help with projects, resumes, or interviews • Want to collaborate, learn, or invite me as a guest speaker 📬 Email: [email protected]
• Built a multi-agent chatbot in Azure Databricks using LangChain + LangGraph • Designed a Supervisor–Worker agent flow with Genie for business data • Chatbot delivers summaries, trends, and insights from structured datasets • Deployed as an MLflow Model Serving endpoint for secure, scalable use
Successfully migrated 15+ complex data pipelines across Snowflake, Azure Databricks, and AWS Databricks ecosystems — including ML model portability via MLflow. 1- Spearheaded Snowflake to Databricks and Azure to AWS Databricks migrations, including end-to-end data pipeline development, validation, and automation. 2- Developed Python- and SQL-based workflows for seamless data migration across multiple clients, ensuring accuracy and high performance. 3- Automated data pipeline scripting and validation processes using PySpark, SQL, and Python, optimizing workflows for large-scale data transitions. 4- Built regex-based automation scripts to transpile Snowflake SQL queries to Databricks-compatible SQL, reducing manual effort and errors. 5- Worked extensively on Databricks Medallion Architecture, Delta Lake best practices, and performance tuning of Spark jobs. 6- Engineered data validation frameworks between Azure and AWS Databricks platforms, ensuring 100% accuracy during critical migration projects. 7- Executed end-to-end MLflow model and experiment migrations between Azure DBX and AWS DBX environments. 8- Led weekly production deployments for data pipeline automation using GCP tools like BigQuery, Dataproc, and Dataflow (Charles Schwab Project). 9- Collaborated closely with data scientists and business stakeholders to ensure migration success, while maintaining governance and compliance standards. 10- Delivered scalable and maintainable ETL frameworks for Retail, Gaming, and AI Assistant industries, driving data readiness for analytics and ML.
1- I worked for about 3 Years in CDW Project where i had an experience working with 2 Different Teams (DDL Team and Data Migration Team). 2- Part of Team where we designed Data Migration Utility that is an ETL Tool to Migrate data from Teradata to GCP, that gave End-to-End solution for loading the data from teradata to GCP by extracting data using TDCH/SQOOP from teradata then transforming the data using PySpark and loading it to GCP BigQuery. Technologies Used: Python,Teradata, GCP,Unix, Pyspark 3- Worked with cross functional teams to ensure smooth loading of data using ETLs and helped ETL teams in case of data discrepancy of the data. 4- Worked on Conversion of 40,000 Objects (Tables,Views and Procedures) and worker with cross functional teams to get DDL compatible as per data accepted in GCP. 5- Worked for deployment of objects in BigQuery Environements through liquibase Bamboo Build.
-Part Of Core Data Engineering Team. -Working on Teradata, Google Cloud platform(GCP) and BigQuery.