India
Experienced Data Engineer with a strong focus on development of robust data-pipelines that can handle data at TB Scale. My Expertise include building delta lake solutions with AWS/Databricks, designed and developed critical metrics/customer360 features for data science & business use cases. Proficient in SQL, SPARK/PySpark with strong emphasis on clean and reusable code • TOOLS/PLATFORM USED : AWS(S3, Athena, GLUE, REDSHIFT etc.), Databricks , Spark, PySpark, DELTA LAKE, SQL, Scala, Python, Shell Script(Unix), Hadoop, Hive, Airflow, Git, Jira/Confluence.
●Engineered data pipelines to seamlessly process data incrementally at gigabyte scale using Databricks(PySpark). ● Applied Spark Optimization Strategies to reduce job run time & compute costs . ● Utilized AWS Redshift for data exposure to business reporting tools. Applied Best Practices to Improve Redshift Performance. ● Orchestrated Databricks & AWS Batch jobs together using Airflow
● Contributed Significantly to the Migration Team for transition of Toyota's important data-pipelines from On-premise(Cloudera) to Cloud(AWS + Databricks). ● During this process, I was contributing to generic code bases to build generic jobs to facilitate ETL Process by adopting Medallion Architecture & Databricks best practices with Delta Lake. ● Developed a logic using to process data incrementally in an efficient manner by using partition pruning technique.
●Utilized SPARK for parsing JSON and successfully ingested data into HDFS from a REST API. Utilized Spark Structured Streaming(Autoloader) to perform near-real-time data processing from Cloud Files. ●Managed the orchestration of multiple Spark jobs leveraging OOZIE, automating their execution using CRON expressions