Gurugram, Haryana, India
Data Engineer | ~2 years of experience (including internship) | Python • SQL • GCP • Databricks I’m a Data Engineer focused on building data systems that are not just functional, but reliable and easy to work with. For me, data engineering is less about moving data from one place to another and more about creating pipelines that people can trust without constantly questioning the output. In my current role, I work extensively with Python, SQL, and GCP to design and maintain ETL pipelines and manage large-scale datasets. A big part of my work involves ensuring data quality—validating schemas, handling inconsistencies, and making sure downstream systems receive clean and consistent data. I’ve also worked on optimizing data workflows to improve performance and reduce latency, making insights more accessible for business teams. Alongside core engineering work, I’ve contributed to building dashboards that help stakeholders track key metrics and understand business performance more clearly. This exposure has helped me think beyond pipelines and understand how data is actually consumed. I also spend time exploring tools like Databricks and PySpark through hands-on projects, which helps me stay adaptable and think about scalability from a practical perspective rather than just theory. I’m particularly interested in solving real-world data challenges, improving existing systems, and working in environments where I can continue learning while contributing meaningfully. If you're working on data problems that need thoughtful engineering, I’d be glad to connect.
• Built and maintained ETL pipelines to migrate large-scale data from legacy systems to BigQuery on GCP, following Medallion architecture approach. • Managed data ingestion and transformation using GCS and BigQuery, ensuring consistent schema and smooth data flow across layers. • Performed data quality checks including schema validation, row count verification, and duplicate handling to maintain reliable datasets. • Implemented BigQuery MERGE operations to handle incremental data loads and keep datasets up to date. • Identified and resolved upstream data dependencies, enabling uninterrupted downstream processing. • Contributed to improving pipeline efficiency and reducing data latency, making data more accessible for reporting and analysis.
• Developed Power BI dashboards by integrating data from Snowflake, enabling stakeholders to track key business metrics. • Helped business users monitor KPIs, trends, and performance through interactive and easy-to-understand reports.