Harikrishnan M

Data Engineer | Databricks 2 × Certified

India

About

Experienced Data Engineer with a strong focus on development of robust data-pipelines that can handle data at TB Scale. My Expertise include building delta lake solutions with AWS/Databricks, designed and developed critical metrics/customer360 features for data science & business use cases. Proficient in SQL, SPARK/PySpark with strong emphasis on clean and reusable code • TOOLS/PLATFORM USED : AWS(S3, Athena, GLUE, REDSHIFT etc.), Databricks , Spark, PySpark, DELTA LAKE, SQL, Scala, Python, Shell Script(Unix), Hadoop, Hive, Airflow, Git, Jira/Confluence.

Experience

Cognizant (Full-time · 8 yrs 5 mos)
- Senior Associate
  Oct 2024 - Present · 1 yr 9 mos
  ●Engineered data pipelines to seamlessly process data incrementally at gigabyte scale using Databricks(PySpark). ● Applied Spark Optimization Strategies to reduce job run time & compute costs . ● Utilized AWS Redshift for data exposure to business reporting tools. Applied Best Practices to Improve Redshift Performance. ● Orchestrated Databricks & AWS Batch jobs together using Airflow
- Associate
  Jan 2021 - Sep 2024 · 3 yrs 9 mos
  ● Contributed Significantly to the Migration Team for transition of Toyota's important data-pipelines from On-premise(Cloudera) to Cloud(AWS + Databricks). ● During this process, I was contributing to generic code bases to build generic jobs to facilitate ETL Process by adopting Medallion Architecture & Databricks best practices with Delta Lake. ● Developed a logic using to process data incrementally in an efficient manner by using partition pruning technique.
- Program Analyst
  Jan 2019 - Dec 2020 · 2 yrs
  ●Utilized SPARK for parsing JSON and successfully ingested data into HDFS from a REST API. Utilized Spark Structured Streaming(Autoloader) to perform near-real-time data processing from Cloud Files. ●Managed the orchestration of multiple Spark jobs leveraging OOZIE, automating their execution using CRON expressions