Vasanth Kumar

Data Engineer|Databricks |AWS |Snowflake| GenAi| RAG| Agentic AI|Data Analytics |Devops practices |Architect| IOT/IIOT | CICD| Cloud Engineer | ETL/ELT | Python | SQL | Pandas | PySpark | Kafka | Data Wearhouse|Redshift

Coimbatore, Tamil Nadu, India

About

I don't just move data — I build the systems that make data useful. With 6+ years as a Data Engineer, I specialize in designing scalable data pipelines, modern lakehouse architectures, and GenAI-powered analytics solutions across AWS, Databricks, and Snowflake. I've worked across industries like IoT/IIoT, EV, Manufacturing, Automotive, and Banking — turning messy, high-volume data into fast, reliable, and intelligent platforms. What makes me different? I close the gap between raw data and real business value — fast. I've built POCs in days that attracted nearly ₹1 Cr in project opportunities, and I've consistently delivered 40–70% cost and performance improvements across storage, compute, and query optimization. Here's what I build: Real-time & batch pipelines using Kafka, Kinesis, AWS IoT Core, and PySpark Lakehouse architectures on Databricks with Delta Lake on S3 GenAI chatbots using Amazon Bedrock + Athena for natural language data queries RAG & Agentic AI solutions that bring intelligence into data platforms End-to-end CI/CD-driven deployments on AWS ECS, ECR & Fargate My core stack: AWS (Bedrock, Glue, Redshift, IoT Core, Kinesis) · Databricks · Snowflake · PySpark · Python · SQL · Kafka · Airflow · Docker · GitHub Actions I hold the AWS Certified Solutions Architect certification and stay hands-on with the latest in GenAI, LLMs, and agentic frameworks. Right now, I'm open to senior Data Engineer, Lead Data Engineer, or AI Data Platform roles where I can architect impactful solutions at scale. 📩 Reach me at [email protected] or connect with me here on LinkedIn — I respond fast.

Experience

  • Lead Software Engineer at Petrus Technologies
    Nov 2024 - Present · 1 yr 8 mos

    Designing and building enterprise-scale cloud data platforms and lakehouse architectures for industrial analytics, IIoT data processing, and AI-driven insights. Responsible for developing scalable real-time and batch data pipelines, integrating enterprise systems, and enabling advanced analytics using AWS, Databricks, and modern data engineering practices. Designed and implemented a Databricks Lakehouse architecture on Amazon S3 using Delta Lake for scalable analytics and high-performance data processing. Built automated pipelines using Databricks Lakeflow Connect, Declarative Pipelines, and Asset Bundles to manage data ingestion, transformation, and orchestration. Developed API-driven data ingestion frameworks to collect data from internal and external platforms. Implemented ETL/ELT pipelines using Python, PySpark, Databricks, AWS Lambda, and AWS Glue to process structured and unstructured datasets for analytics. Built real-time IIoT data pipelines using AWS IoT Core, Kinesis Data Streams, Kinesis Firehose, DynamoDB, and Amazon S3 for live machine monitoring and operational insights. Optimized storage using Parquet and Delta formats with partitioning for efficient analytics using Athena and Databricks SQL. Developed GenAI-powered applications integrating LLMs, OpenAI APIs, Azure OpenAI, and Amazon Bedrock. Implemented RAG architectures using vector databases and enterprise data stored in S3 and lakehouse environments to enable natural language querying. Designed secure data APIs using AWS API Gateway and Lambda, and implemented Single Sign-On (SSO) with security using AWS IAM, Cognito, and VPC. Extensively worked with AWS services including S3, Glue, Athena, Redshift, DynamoDB, EC2, Lambda, API Gateway, Kinesis, IoT Core, Bedrock, OpenSearch, CloudWatch, VPC, and DMS to build scalable and cost-efficient data platforms. Focused on performance optimization, cost reduction, and scalable architecture supporting advanced analytics and AI-driven decision making.

  • CloudThat (Full-time · 3 yrs 3 mos)
    • Lead Subject Matter Expert - IoT
      Nov 2023 - Nov 2024 · 1 yr 1 mo

      Worked on designing and implementing cloud-based data engineering and analytics solutions using AWS, enabling scalable platforms for IoT, EV analytics, and enterprise data workloads. Built real-time and batch data pipelines, optimized data processing frameworks, and delivered analytics solutions for business intelligence and operational insights. Developed real-time EV data pipelines using AWS MSK (Kafka), Logstash, Lambda, OpenSearch, and Amazon S3, enabling live data ingestion, processing, and visualization through Grafana dashboards. Implemented ETL frameworks to handle high-volume streaming IoT data efficiently. Built batch data pipelines using Amazon S3, AWS Glue, and Amazon Redshift, enabling structured analytics and reporting through Amazon QuickSight dashboards. Automated workflows using SQL, stored procedures, and schedulers to improve data processing efficiency. Designed event-driven ETL pipelines using AWS Lambda, S3 triggers, and Kinesis Firehose to automate data ingestion and transformation. Optimized storage using Parquet and Gzip compression, improving query performance and reducing storage costs. Developed data analytics and visualization solutions with Amazon QuickSight, creating dashboards to track business KPIs and operational metrics with row-level security and performance optimization. Built API-based data ingestion pipelines integrating enterprise and third-party data sources for scalable structured and semi-structured data processing. Implemented document processing automation using AWS Lambda, Amazon S3, and Amazon Textract to extract data from documents and handwritten forms. Collaborated on GenAI-enabled solutions, exploring RAG architectures and LLM integrations to enable intelligent querying and insights from enterprise datasets.

    • Sr. Research Associate
      Jun 2022 - Nov 2023 · 1 yr 6 mos

    • Research Associat-IoT
      Sep 2021 - Jun 2022 · 10 mos

  • IT&IOT Engineer at FutureFarms
    Jul 2019 - Sep 2021 · 2 yrs 3 mos