Singapore
AWS Data Engineer with 4 + years of experience in Software development with proficiency in design and development of Hadoop and Spark applications with SDLC Process. Extensive work experience in building efficient data pipelines using Big Data-Hadoop Frameworks (HDFS, Hive and Oozie), Spark Eco System Tools (Spark Core, Spark SQL), PySpark and Python Very good experience in AWS Cloud Services EMR, S3, Glue, Athena and RDS. Worked in Product Based Companies and have good experience in Product Development. Worked in Agile software development model Collaborated with other technology teams to ingest, transform, and load data from multiple data sources, structured data. Built and maintained robust automated data pipelines to support data solutions across BI and analytics use cases. Implemented trustworthy and efficient data transformations via ETL and ELT Built & improved data pipelines and services through CI/CD. Closely worked with Product Owners during requirements analysis and UAT Phase and received appreciations from Product owners. Executed Production Deployment in smoother way by connecting with multiple teams. Ability to work with business users to explain concepts and understand requirements Worked closely with testers to prove the functional and non-functional behavior of the pipelines.
· Tech Stack: Spark Core, Spark SQL, PySpark, Python, ETL, Cloudera Cluster, Hadoop, Apache Hive, Spark with Scala, Migrating of project from abinitio code to spark with scala. · Cloud: AWS Glue, EMR, S3, RDS, Athena · Built and optimized batch data pipelines using PySpark and Glue · Migrated Spark Jobs to EMR and improved performance · Developed standardized data pipelines and utilities · Documented processes and executed unit tests · Provided production support
Tech Stack: Hadoop, Spark Core, Spark SQL, PySpark, Python, Cloudera Cluster, and Apache Hive Cloud: AWS Glue, EMR, S3, RDS and Athena Actively Involved in analysis of the technical specifications. Efficiently built the batch data pipelines using PySpark. Expertise in performance optimization of spark jobs. Build data pipelines using Glue. Stored the transformed data into RDS (Postgres) Skilfully worked on Migration of Spark Jobs to EMR using S3 as Storage. Actively worked on performance improvement to spark jobs to run efficiently in EMR. Built common utilities, and standardized data pipelines to ensure consistency across the organization. Building and maintaining operational runbooks, streamlining procedures and ETL jobs. Developed documentation to assist users. Executed the Unit test cases Actively participated in Production Support.
Tech Stack: Hadoop, Spark Core, Spark SQL, PySpark, Python, Cloudera Cluster, Apache Hive and Oozie Involved in analysis of the technical specifications. Processed Banking transactions of the customers Used Spark SQL to process the huge amount of structured data available in Hive Tables. Expertise in performance optimization of spark jobs. Automation of jobs using Oozie.
· Tech Stack: Hadoop, Spark Core, Spark SQL, PySpark, Python, Cloudera Cluster, Apache Hive, Oozie, ETL · Processed banking transactions and optimized Spark jobs · Automated jobs using Oozie
Tech Stack: Ab Initio, Spark (Scala), Hadoop, SQL, Shell Scripting, Airflow, Cloudera, Jenkins, GitHub • Worked on Ab Initio to Spark (Scala) migration, converting legacy ETL graphs into Spark-based batch processing jobs. • Developed and optimized Spark (Scala) applications on Hadoop (Cloudera) for large-scale data processing. • Wrote complex SQL queries for data transformation, validation, and reconciliation during migration. • Scheduled and monitored workflows using Apache Airflow, and automated deployments through Jenkins CI/CD pipelines. • Used Shell scripting, GitHub version control, and provided production support to ensure smooth job execution.