Harihara Narayanan M

Senior Data Engineer @Walmart | GCP & Azure | Spark | Airflow | BigQuery | Databricks

Chennai, Tamil Nadu, India

About

Data Engineer with 5+ years of experience building scalable, cloud-native data pipelines across Azure and GCP ecosystems. Currently at Walmart I design and develop end-to-end data pipelines using Scala Spark, Hive, Dataproc, BigQuery, and Airflow on Google Cloud Platform. I’ve also contributed to deployment processes through one-click deployment CI/CD pipelines to ensure faster and more reliable releases. Previously at Tiger Analytics, I worked on the Azure stack—developing ELT pipelines in Databricks, orchestrating workflows with ADF, and implementing robust alerting and automation using Logic Apps and Azure Functions. Core Skills: GCP, Azure, Scala Spark, PySpark, Hive, BigQuery, Airflow, Dataproc, Databricks, Delta Lake, Azure Data Factory, CI/CD, SQL, Python

Experience

Walmart Global Tech India (Hybrid)
- Senior Data Engineer
  May 2026 - Present · 2 mos
- Data Engineer
  May 2024 - May 2026 · 2 yrs 1 mo
  ▪ Optimized performance of key Scala Spark jobs, reducing execution time by over 50%, improving overall data pipeline efficiency ▪ Implemented one-click CI/CD deployment using a Jenkins-like framework with custom shell scripts, accelerating delivery cycles ▪ Developed custom Airflow operators to submit Spark jobs to Dataproc Serverless, enabling scalable and flexible orchestration ▪ Built and maintained data pipelines for TCO (Total Cost of Ownership) models, embedding complex business rules ▪ Designed a custom data quality utility on BigQuery with alerting mechanisms to monitor and ensure pipeline data integrity ▪ Increased project test coverage to 90% by implementing unit tests, enhancing code reliability and maintainability
Tiger Analytics (Chennai, Tamil Nadu, India)
- Data Engineer
  Jan 2023 - May 2024 · 1 yr 5 mos
  ▪ Optimized PySpark scripts to reduce execution time by 50%, improving data processing performance ▪ Implemented SCD Type 1, Type 2, and bulk load mechanisms using Delta Lake and PySpark ▪ Configured Azure Event Grid topics and integrated them as custom triggers in Azure Data Factory (ADF) ▪ Participated in Agile ceremonies, contributed to user story creation, task tracking, and technical mentoring within the team
- Senior Software Engineer
  Oct 2021 - Feb 2023 · 1 yr 5 mos
  ▪ Built pipelines to ingest data from multiple sources including APIs, Blob Storage, and SQL Server ▪ Implemented Medallion architecture in Databricks using Delta Lake to organize and streamline data flow ▪ Created custom alerting solutions using Azure Logic Apps for email and Microsoft Teams notifications ▪ Added unit testing and logging to Databricks notebooks to enhance reliability and traceability ▪ Automated data extraction workflows using Azure Functions to fetch and upload reports into Azure Data Lake ▪ Developed and deployed pipelines through Azure DevOps CI/CD, including Databricks notebooks and ADF components
Mindtree (Full-time · 2 yrs 4 mos)
- Senior Engineer
  Jul 2021 - Oct 2021 · 4 mos
  ▪ Built Delta Lake tables based on complex business logic using PySpark and SQL in Azure Databricks ▪ Orchestrated Databricks notebooks using Azure Data Factory (ADF) for scalable data pipelines ▪ Developed a custom Azure Logic Apps connector to trigger and monitor Azure Databricks jobs ▪ Created KPI dashboards and reports for metrics like CLTV and churn rate using SQL queries on processed data ▪ Conducted a successful POC on Apache Pinot to evaluate it as a cost-effective alternative to Cosmos DB for real-time analytics
- Engineer
  Jul 2019 - Oct 2021 · 2 yrs 4 mos
  ▪ Developed a Flask-based Python API to serve recommendation data from Cosmos DB to frontend systems ▪ Improved API performance using Redis caching and payload optimization techniques ▪ Containerized and deployed the backend service on Azure Kubernetes Service (AKS) using Docker ▪ Calculated evaluation metrics like Map@K and Recall for recommendation models using PySpark ▪ Enhanced performance of existing recommendation models using thread pooling in Python (NumPy, Pandas, scikit-learn)