Gurugram, Haryana, India
I've spent years turning raw, messy data into systems that actually work at scale — and I've made every mistake along the way so you don't have to. Currently a Data Engineer III at Expedia Group, where I work with Apache Spark, Python, Airflow, and cloud platforms to build pipelines handling millions of events daily. I'm a LinkedIn Top Data Engineering Voice and Top Icon of India — but more than the badges, what I care about is making data engineering less intimidating and more accessible for engineers at every level. Here's what you'll find on my feed: → Real talk about Spark optimizations (the stuff that actually works) → Career lessons from climbing from DE I to DE III → Hot takes on data tools, trends, and what the hype misses → Proof that you can love data and also disappear into the mountains on weekends If you're a data engineer, aspiring DE, or just obsessed with building things that scale — hit Follow. I post 4–5x a week. Let's build something great. 📩 Open to speaking, collaborations, and mentoring. DM me anytime.
Lead and mentor a team of 5 data engineers, driving engineering best practices and improving delivery predictability and code quality. Collaborate closely with product, analytics, and business stakeholders to translate requirements into scalable data architectures supporting multiple downstream applications and dashboards. Architect and own end-to-end batch data pipelines, processing TB-scale datasets on a daily basis. Design and implement high-performance ETL/ELT pipelines using Python, Spark, and SQL to optimize data processing latency and reliability. Lead the development of a centralized, cloud-based data lake, improving data accessibility and accelerating data onboarding for new use cases. Serve as the primary point of contact for production stability, driving root-cause analysis and long-term reliability improvements. Implement and manage Airflow workflows with retries, SLAs, and alerting to reduce manual interventions and improve operational efficiency. Establish and enforce engineering best practices, including code reviews, data quality checks, and documentation, to improve team velocity and onboarding. Influence cross-functional stakeholders by clearly communicating technical trade-offs and roadmap decisions aligned with business priorities. Lead the development of an AI-driven chat solution, leveraging LLMs, RAG, and agent-based workflows for enterprise-grade question answering. Design scalable data ingestion, vectorization, and retrieval pipelines integrating data lakes, embeddings, and LLM orchestration for secure, low-latency AI experiences.
Designed and built scalable, resilient cloud-native data solutions using PySpark, SQL, and modern internal platforms, significantly improving system efficiency and performance. Developed 5 reusable platform products adopted across multiple projects, accelerating delivery and reducing overall development effort. Architected end-to-end data platforms from scratch, covering raw data ingestion, multi-layer storage, and reporting layers. Implemented robust data ingestion mechanisms supporting Parquet, Hive tables, APIs, and CSV sources for diverse upstream systems. Led DevOps initiatives focused on automation and CI/CD using GitHub Actions and Spinnaker, improving deployment reliability and reducing errors. Designed and implemented 15+ Airflow DAGs for data ingestion and cleansing into staging layers, significantly improving processing efficiency. Mentored and coached 3 new team members, accelerating their onboarding and improving overall team productivity. Prototyped and delivered innovative data solutions, driving experimentation and continuous improvement that resulted in production-ready enhancements.
Architected a reusable common component framework using Azure Data Factory, significantly improving data processing efficiency and pipeline scalability. Designed and delivered end-to-end data ingestion and orchestration pipelines leveraging ADF, Databricks (PySpark, Spark SQL), and ADLS, optimizing overall processing performance. Developed dimension, fact, and reporting layers in Databricks and Snowflake, enhancing query performance and analytics readiness. Implemented 15+ automated data transfer pipelines and CI/CD workflows across SIT, UAT, and Production, streamlining deployments and improving release efficiency. Led production support and root-cause analysis, collaborating with stakeholders to resolve issues within SLA and improve delivery quality and requirement clarity.
Developed and maintained 30+ Databricks notebooks using PySpark, Spark SQL, and SQL to enable scalable data processing and analytics. Designed and deployed 15+ Azure Data Factory pipelines, ensuring reliable and efficient data movement across multiple source and target systems. Built a Transformation Engine integrating data from 10+ source interfaces, enhancing data standardization and downstream system compatibility. Created analytics-ready datasets to support reporting and machine learning use cases, improving data-driven insights and decision-making. Owned production support and pipeline optimization, resolving issues across 15+ jobs and 100+ data transfers, significantly reducing errors and improving operational reliability.
Data Engineer
ETL Developer