Girish Dukare

Data Engineer | AI | Python | Spark | Azure Synapse | AWS Glue | Hive | Azure Stream Analytics | Kafka | Airflow | ADF | AWS Athena | Tableau | SAP HANA | Azure SQL | SQL

Pune District, Maharashtra, India

About

As a dedicated Data Engineer with over five years of expertise in architecting scalable cloud-native solutions, I specialize in transforming raw data into actionable insights through robust pipelines and innovative engineering. My work spans AWS and Azure ecosystems, where I design systems that balance efficiency, scalability, and governance for industries like banking and telecommunications. Career Highlights ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ 🔹 Bridgenext (Senior Software Development Engineer): Designing and deploying AWS-based data pipelines (Glue, EMR, S3) to aggregate and process high-volume data from diverse sources, enabling seamless integration for analytics and decision-making. Developing batch and streaming frameworks to support large-scale data workflows, optimizing performance for enterprise-level applications. Collaborating cross-functionally to enhance feature stores and refine data models, ensuring alignment with cloud-native best practices. 🔹 Reliance Jio (Big Data Developer): Engineered end-to-end data solutions using Azure Synapse, Spark, and Kafka, processing terabytes of daily data for real-time analytics. Automated reporting systems and built Python-based tools to generate PySpark code, reducing dependency on specialized technical skills. Optimized storage strategies through partitioning and bucketing, improving query efficiency across hybrid cloud environments. Technical Expertise ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ☁️ Cloud Platforms: AWS (Glue, Athena, EMR), Azure (Synapse, ADF, Stream Analytics) 📊 Data Engineering: Spark, Kafka, Hive, Airflow, Real-time/Batch Pipelines 💡 Tools & Languages: Python, SQL, Tableau, SAP HANA, Azure SQL Recognized for driving impactful outcomes in cloud migrations and data governance, I thrive on solving complex challenges with cutting-edge tools. Let’s connect to explore opportunities in scalable data engineering or collaborative innovation!

Experience

  • Data Engineer at Citi India
    Aug 2025 - Present · 11 mos

  • Senior Software Development Engineer at Bridgenext
    Aug 2024 - Aug 2025 · 1 yr 1 mo

    - Spearheading AWS-based data solutions for a US banking client, leveraging Glue, Athena, S3, EC2, EMR, and CloudWatch to design scalable pipelines for fraud detection and risk mitigation. -Developing batch/streaming pipelines to ingest customer onboarding and activity data (logins, transactions) from multiple vendors, supporting fraud teams in combating ATO, identity theft, and money laundering. - Collaborating with fraud analytics teams to identify patterns between labeled fraudulent accounts and legitimate customers, enabling data-driven decision-making. - Contributing to feature store enhancements by integrating new data attributes, enabling AIML teams to refine fraud prediction models. - Gaining expertise in banking compliance, transactional risk frameworks, and fraud analytics workflows.

  • Data Engineer at Jio
    Apr 2020 - Aug 2024 · 4 yrs 5 mos

    - Designed and implemented generic data pipelines ingesting from HDFS, APIs, and cloud storages into Hive, Kafka, and Kudu using PySpark - Architected an Azure Synapse pipeline to extract data from Azure Data Lake Storage (ADLS), Event Hub, and on-premises Kafka, leveraging Spark pool for data processing and loading into a dedicated SQL pool table - Developed Azure Stream Analytics-based applications for real-time data flow, employing CosmosDB for storage and Azure Data Explorer for insights - Implemented an application to trigger email alerts for Tableau dashboards using Python and Tableau REST API, eliminating licensed tool costs and enhancing reporting efficiency - Engineered an innovative Python application to autogenerate PySpark code from JSON input, empowering business analysts to craft KPIs with minimal need for Spark technical expertise - Handled large volumes of data (> 1 billion records/1TB volume) daily, ingesting into Cloud Sources and loading into On-Premise Hive Table, utilizing data partitioning and bucketing for efficient data storage