Jakarta, Indonesia
● Implemented Change Data Capture (CDC) using Debezium to capture real-time changes from PostgreSQL and MongoDB, enabling real-time tracking and capturing of every single change in the database to ensure accurate data synchronization across systems. ● Researched and implemented Polars for ETL transformations, achieving an 85% reduction in execution time compared to the existing Pandas pipeline, significantly enhancing performance and resource efficiency. ● Designed and built data warehousing infrastructure and pipeline from the ground up for a subsidiary, enabling advanced analytics and data-driven decision-making to enhance operational performance. ● Developed a Scala Spark data transformation job to replace an existing Dataflow job, achieving a 33% reduction in job completion time. ● Optimized heavy SQL transformation, reducing resource usage by 68% and reducing job completion time by 48%. ● Applied Slowly Changing Dimension (SCD) Type 2 in a campaign dimension table to track historical data changes, enhancing analytics by maintaining a complete history and providing deeper insights into trends and data evolution. Additionally, optimized storage size by 72% compared to periodic historical snapshots, resulting in more efficient storage usage. ● Set up and orchestrated dbt Core in Airflow to streamline data transformation processes, enabling data analysts to manage data transformations more efficiently and eliminating the need for manual execution scheduling. ● Developed a self-service, codeless data pipeline generator web application using Django (backend) and Vue.js (frontend), now used by all data analysts to create data pipelines for building data marts without needing to write code. ● Evaluated Data Lakehouse technologies, utilizing Delta Lake and Apache Iceberg table formats, leveraging Apache Spark as the compute engine. Created ETL process simulations to assess performance.
● Designed and developed a self-service ETL script generator web application using Django (backend) and Vue.js (frontend) to automate ETL script creation for Data Engineers. ● Implemented incremental ETL/ELT pipelines using Airflow to populate data in the data warehouse from sources such as PostgreSQL, MongoDB, ArangoDB, and third-party APIs. ● Developed data pipelines with BigQuery SQL transformations to clean, standardize, and structure data, enhancing reporting and analytics efficiency. ● Contributed to the migration of Airflow from version 1.x to 2.x by refactoring deprecated operators and ensuring compatibility with existing ETL pipelines.
Represent BINUS University in various national and regional competitive programming competition.