Subhra Kundu

Senior Analytics Engineer 1 @ MongoDB 🍃

Gurugram, Haryana, India

About

Results-driven Data Architect with 4+ years of experience in cloud data platforms, telemetry analytics, and data engineering, currently working on Atlas telemetry and agentic AI use cases at MongoDB. I design and orchestrate scalable ETL/ELT pipelines in MARS and Airflow, build performant data models on modern lakehouse technologies like Delta Lake and Apache Iceberg, and convert raw telemetry into reliable, self-serve insights for product and business teams. My recent work includes high‑volume ingestion pings data from Amazon Kinesis streams using dynamic Python-based DAG frameworks, and vector-search powered prompt management for AI agents—focused on improving data quality, reducing maintenance overhead, and enabling faster decision-making at scale.

Experience

  • MongoDB (Gurugram · On-site)
    • Senior Analytics Engineer 1
      Apr 2026 - Present · 3 mos

    • Data Architect
      Jan 2025 - Apr 2026 · 1 yr 4 mos

      Ingest high-volume Atlas telemetry (≈550k M0 clusters/day) from Amazon Kinesis into the warehouse and build delta-based pipelines to compute key serverStatus metrics for cluster health and usage. Designed and refined Cluster Active data models using reconciliation reports and statistical analysis, significantly improving data quality and trust in downstream telemetry analytics and reporting. Implemented dynamic Python-based DAGs in Airflow to orchestrate large-scale telemetry transformations, reducing maintenance effort and simplifying onboarding for new data workflows. Migrated sub-optimal Apache Hive tables to Apache Iceberg, enabling time-travel queries and more efficient processing for large telemetry datasets. Built a custom MCP server backed by MongoDB and VoyageAI vector search to store and retrieve dynamic prompt libraries, enabling developers to provide richer context to AI agents and generate higher-quality artifacts. Created automated Airflow pipelines to collect table-level metadata and statistics across Hive and Iceberg, and to drop/archive objects via configuration-driven rules—saving ~$300/day in infrastructure costs.

  • Polestar Solutions & Services (Kolkata, West Bengal, India · Hybrid)
    • Senior Analyst
      Jul 2022 - Dec 2024 · 2 yrs 6 mos

       Performed concurrent and incremental data ingestion from multiple API sources through asyncio/threadpoolexecutor into Azure Blob Storage through Async Durable Functions reducing load time by around 80%  Developed efficient masked datasets for Gen AI models  Determined the size of the spark cluster and the number of executor & cores required according to the velocity and volume of data being ingested.  Integrated data into medallion architecture in Delta Lake through optimized databricks notebooks and created logic app flows to monitor pipeline failures  Created several orchestration pipelines to schedule, monitor and track the entire data lifecycle in ADF.  Successfully migrated from all purpose clusters to job clusters to run production ready workflows and optimize costs.  Developed multiple Logical Data models and Excel Simulation working files to show-case several KPIs and metrics which helped the business to reduce its delinquent tenants and keep an effective track of its vacant inventory  Integrated Azure Key Vault into ADF and ADB using system assigned Managed Identities for data security.  Developed Business logic transformations using Pyspark and SQL.  Developed complex DAX logic to support BI analytics  Optimized the data ingestion process to provide analytics using around ~ 1 Lakh XML POS files in near real time  Provided an entire architecture to monitor and send alerts for incorrectly generated POS files to ensure data consistency through logic apps

    • Analyst
      Jun 2021 - Jun 2022 · 1 yr 1 mo

      IT Services (Jun 2021 - Dec 2022) • Migrated Enterprise data warehouse from SQL Server to Azure Synapse analytics. • Implemented data archival, encryption, and approval processes. • Reduced load time for ingesting large parquet files by around 80% by using multiprocessing. • Developed logging and audit tracking scripts using Python and Azure Data Factory.

  • Summer Intern at Reliance Industries Limited
    Jul 2020 - Jun 2021 · 1 yr

    Estimation of COP (Coefficient of Performance) for a refrigeration compressor

  • Industrial Trainee at Indian Oil Corporation Limited
    Dec 2019 - Jan 2020 · 2 mos

    Worked in the Hydrogen Generation Unit (HGU) - studied appropriate flow parameters for production of approximately 99.99% pure Hydrogen

  • Graduate Engineering Trainee at Tata Metaliks Ltd
    Jun 2018 - Jul 2018 · 2 mos

    Compute a chart to depict the trends of chemical composition(determined by an optical spectrometer ) of the coal procured from different mines .