Mumbai, Maharashtra, India
As an Associate Data Engineer at Jio Platforms Limited, I manage and administer 6–7 Hadoop clusters with technologies like HDFS, YARN, Hive, and Apache Ranger to ensure performance, availability, and secure access control. My responsibilities also include overseeing Presto clusters through the organization’s in-house Rover application, delivering query execution and monitoring solutions. Additionally, I handle 5–6 Apache Airflow instances, focusing on DAG creation, scheduling, and secure user access management. Holding a Bachelor of Engineering in Information Technology from St. Francis Institute of Technology, I bring expertise in automating Apache Ranger policies via REST APIs and developing shell-based monitoring tools to enhance data platform reliability. My continuous learning journey is complemented by certifications in AWS Cloud Technical Essentials and Microsoft Azure, allowing me to contribute effectively to scalable and secure data infrastructure projects.
Big Data Engineering & Cloud Administered and optimized 6–7 Hadoop clusters using Cloudera Manager, supporting HDFS, YARN, Hive, Kudu, Iceberg, and Apache Ranger with focus on performance, high availability, and secure access. Managed 6-7 Presto clusters via Rover platform, improving query performance, monitoring, and troubleshooting production issues. Handled 5-6 Apache Airflow instances including DAG development, scheduling, monitoring, and RBAC. Automated Apache Ranger policies using REST APIs to enable scalable governance. Built shell-based monitoring solutions for cluster health and job tracking. Integrated Presto with Hive Metastore (HMS) for metadata consistency and optimized execution. Performed Hadoop migrations using SSH and edge nodes, ensuring data integrity with minimal downtime. Reduced Presto query failures and improved efficiency. Deployed Trino and Hive with Tableau (Dockized) for BI reporting. Managed YARN queues and optimized workloads. Developed and tuned Spark ETL pipelines using partitioning, caching, and memory optimization. Implemented Medallion Architecture on Azure Databricks (Dev & Prod). Cloud & Data Engineering: Built cloud-native pipelines on AWS and Azure, improving scalability and reliability. Worked with Azure Data Factory, Azure Synapse Analytics, ADLS, Azure ML Studio, and VMs. Developed Databricks workloads with cluster tuning and orchestration. Processed large-scale data using Python, PySpark, and Spark. Used Snowflake features like Time Travel, Zero-Copy Cloning, and Streams & Tasks for data operations DevOps & Automation: Managed CI/CD using GitLab, Jenkins, Kubernetes, Docker, Argo CD, Azure DevOps, and Terraform. Improved release stability by fixing pipeline failures. Automated deployments to accelerate delivery. Enhanced monitoring, logging, and alerting, reducing recovery time and improving platform reliability. Collaborated in Agile teams to deliver scalable, governed data solutions and continuous improvements.
The internship program includes data visualization skills , framing business scenarios , visualization using Tableau and Power BI tools and data analysis on datasets.