Lead Data Engineer

Chargebee

Chennai

Description

About the Role

The Data Platform team at Chargebee builds and maintains scalable data systems that power internal analytics, business intelligence, and customer-facing data features

.As a Lead Data Engineer, you will play a key role in shaping the architecture, scalability, and reliability of Chargebee’s data platform. You will lead the design and development of large-scale data systems, mentor engineers on the team, and drive best practices across data engineering workflows

.You will work closely with product engineers, analysts, platform teams, and leadership to ensure that data is ingested, processed, and made available efficiently for analytics and product use cases. This role involves designing robust data pipelines, optimizing distributed data processing systems, and guiding the evolution of the data platform to support Chargebee’s growing data needs.

The team operates in a fast-paced and collaborative environment, building reliable and scalable infrastructure that powers data-driven decision making across the company

What You Will Work On

As a Lead Data Engineer, you will lead the development and evolution of Chargebee’s data platform. This includes designing scalable data architectures, building robust ingestion and processing pipelines, and ensuring data systems operate reliably at sc

ale.You will also guide the technical direction of the platform, mentor engineers, and collaborate across teams to enable efficient and scalable data workflows.

The role provides exposure to:

  • Large-scale data ingestion and processing pipelines
  • Streaming and event-driven architectures
  • Distributed data processing frameworks
  • Cloud-based data infrastructure
  • Building and maintaining data lake and data warehouse architecture.
  • Designing scalable data platforms powering both internal analytics and customer-facing products
  • Leading architectural decisions and platform evolution for large-scale data systems

Key Responsibilities

  • Design and architect scalable, reliable data ingestion and processing pipelines across the data platform.
  • Lead the development and optimization of ETL/ELT workflows to support high-volume and scalable data processing.
  • Build and maintain distributed data processing systems using frameworks such as Apache Spark.
  • Design and implement event-driven data architectures using streaming systems such as Kafka.
  • Define data modeling standards and transformation strategies to support analytics and product use cases.
  • Ensure high standards of data reliability, integrity, scalability, and performance across the data platform.
  • Lead troubleshooting and debugging efforts for complex production data pipelines and distributed systems.
  • Collaborate with data analysts, product teams, and engineering teams to design scalable data solutions.
  • Mentor and guide data engineers, providing technical leadership and promoting engineering best practices.
  • Lead design discussions, architecture reviews, and technical decision-making within the data platform team.
  • Participate in and drive code reviews, technical design reviews, and agile development processes.
  • Document architecture decisions, platform standards, and data engineering workflows.

Minimum Qualifications

  • Bachelor’s degree in Computer Science, Mathematics, Engineering, or a related technical field, or equivalent practical experience.
  • 6+ years of experience building and maintaining large-scale data processing systems and pipelines.
  • Strong experience designing and operating production-grade distributed data systems.
  • Hands-on experience with distributed computing frameworks such as Apache Spark.
  • Strong proficiency in SQL and data modeling.
  • Proficiency in at least one programming language such as Java, Python, or Scala.
  • Strong understanding of data structures, distributed systems, and data platform architecture.
  • Experience working with relational databases such as PostgreSQL, MySQL, or similar systems.
  • Experience designing and building ETL/ELT data pipelines at scale.
  • Experience working with Git workflows in collaborative development environments.
  • Experience working in an Agile development environment.

Good-to-Have Qualifications

  • Experience working within the AWS ecosystem.
  • Experience building and operating large-scale Apache Spark-based data pipelines.
  • Experience with streaming systems such as Kafka or similar event-driven platforms.
  • Strong understanding of data lake and data warehouse architectures
  • Experience designing scalable cloud-native data platforms.
  • Knowledge or prior experience with open table formats such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • Strong technical leadership, communication, and problem-solving skills.
  • Ability to investigate and debug issues across large-scale distributed systems.