Gurugram, Haryana, India
Results-driven Data Architect with 4+ years of experience in cloud data platforms, telemetry analytics, and data engineering, currently working on Atlas telemetry and agentic AI use cases at MongoDB. I design and orchestrate scalable ETL/ELT pipelines in MARS and Airflow, build performant data models on modern lakehouse technologies like Delta Lake and Apache Iceberg, and convert raw telemetry into reliable, self-serve insights for product and business teams. My recent work includes high‑volume ingestion pings data from Amazon Kinesis streams using dynamic Python-based DAG frameworks, and vector-search powered prompt management for AI agents—focused on improving data quality, reducing maintenance overhead, and enabling faster decision-making at scale.
Ingest high-volume Atlas telemetry (≈550k M0 clusters/day) from Amazon Kinesis into the warehouse and build delta-based pipelines to compute key serverStatus metrics for cluster health and usage. Designed and refined Cluster Active data models using reconciliation reports and statistical analysis, significantly improving data quality and trust in downstream telemetry analytics and reporting. Implemented dynamic Python-based DAGs in Airflow to orchestrate large-scale telemetry transformations, reducing maintenance effort and simplifying onboarding for new data workflows. Migrated sub-optimal Apache Hive tables to Apache Iceberg, enabling time-travel queries and more efficient processing for large telemetry datasets. Built a custom MCP server backed by MongoDB and VoyageAI vector search to store and retrieve dynamic prompt libraries, enabling developers to provide richer context to AI agents and generate higher-quality artifacts. Created automated Airflow pipelines to collect table-level metadata and statistics across Hive and Iceberg, and to drop/archive objects via configuration-driven rules—saving ~$300/day in infrastructure costs.
Performed concurrent and incremental data ingestion from multiple API sources through asyncio/threadpoolexecutor into Azure Blob Storage through Async Durable Functions reducing load time by around 80% Developed efficient masked datasets for Gen AI models Determined the size of the spark cluster and the number of executor & cores required according to the velocity and volume of data being ingested. Integrated data into medallion architecture in Delta Lake through optimized databricks notebooks and created logic app flows to monitor pipeline failures Created several orchestration pipelines to schedule, monitor and track the entire data lifecycle in ADF. Successfully migrated from all purpose clusters to job clusters to run production ready workflows and optimize costs. Developed multiple Logical Data models and Excel Simulation working files to show-case several KPIs and metrics which helped the business to reduce its delinquent tenants and keep an effective track of its vacant inventory Integrated Azure Key Vault into ADF and ADB using system assigned Managed Identities for data security. Developed Business logic transformations using Pyspark and SQL. Developed complex DAX logic to support BI analytics Optimized the data ingestion process to provide analytics using around ~ 1 Lakh XML POS files in near real time Provided an entire architecture to monitor and send alerts for incorrectly generated POS files to ensure data consistency through logic apps
IT Services (Jun 2021 - Dec 2022) • Migrated Enterprise data warehouse from SQL Server to Azure Synapse analytics. • Implemented data archival, encryption, and approval processes. • Reduced load time for ingesting large parquet files by around 80% by using multiprocessing. • Developed logging and audit tracking scripts using Python and Azure Data Factory.
Estimation of COP (Coefficient of Performance) for a refrigeration compressor
Worked in the Hydrogen Generation Unit (HGU) - studied appropriate flow parameters for production of approximately 99.99% pure Hydrogen
Compute a chart to depict the trends of chemical composition(determined by an optical spectrometer ) of the coal procured from different mines .