Yu-Chi Chen

Data Engineer @ Cathay | ex-Data Analyst @ TSMC | ex-Business Intelligence Analyst @ DBS Bank

Singapore

About

I am a Data Engineer with a strong foundation in building scalable data architectures and a Master’s degree from the University of Arizona. Currently, I specialize in architecting Medallion Lakehouse structures on Databricks, where I have successfully migrated 500+ tables and optimized pipelines to handle 200M+ daily records, achieving a 25% boost in processing efficiency. With a unique blend of experiences—ranging from large-scale data ingestion at TSMC to research in Graph Foundation Models—I bridge the gap between robust data engineering and advanced machine learning. My background as a Graduate Teaching Assistant in DBMS and Algorithms has honed my ability to communicate complex technical concepts clearly in high-pressure, international environments. Core Expertise: ● Data Engineering: Databricks, Apache Spark (PySpark), ETL/ELT, CDC, Data Modeling. ● Infrastructure & DevOps: Cloud Migration, Docker, CI/CD, Pipeline Monitoring. ● Advanced Analytics: Graph AI, Multimodal Data, NLP, Machine Learning. I am passionate about building data-driven solutions that scale and am always open to connecting with fellow data professionals and exploring global opportunities in the tech space.

Experience

Data Engineer at Cathay Life Insurance Co., Ltd.
May 2025 - Present · 1 yr 2 mos
● Architected a Medallion Architecture (Bronze/Silver/Gold) on Databricks to migrate 500+ tables; unified disparate data formats from DB2, PostgreSQL, and SQL Server into a high-performance cloud lakehouse. ● Developed custom Spark-based modules to handle complex data inconsistencies, including cross-system Timezone normalization and rigorous Null-handling strategies (managing dummy values vs. literal strings) to ensure downstream data accuracy. ● Engineered robust ETL/ELT pipelines using PySpark & Spark Streaming; implemented CDC (Change Data Capture) logic to process 200M+ daily records, maintaining 100% data integrity during schema transitions. ● Built a centralized monitoring framework with categorized error reporting; slashed manual troubleshooting time from 1 hour to minutes, enabling instant identification of table-level failures. ● Optimized resource utilization by refactoring logic for parallel execution and leveraging Databricks' Optimize/Vacuum features, resulting in a 25% faster runtime and reduced cloud infrastructure costs.
University of Arizona, Eller College of Management (Tucson, Arizona, United States)
- Graduate Teaching Assistant
  Jan 2025 - May 2025 · 5 mos
  MIS 301 Data Structure & Algorithm ● Conducted problem-solving sessions for Python-based Data Structures and Algorithms; provided real-time debugging support and code reviews for students to reinforce fundamental programming logic. ● Addressed diverse technical inquiries regarding algorithm complexity and data manipulation, ensuring students' mastery of efficient coding practices.
- Research Assistant at Artificial Intelligence Laboratory
  Aug 2023 - May 2025 · 1 yr 10 mos
  ● Developed a Graph Foundation Model integrating LLMs with graph-text contrastive learning to analyze 60K+ supply chain nodes; automated supplier identification for semiconductor manufacturers under shifting policy frameworks. ● Implemented a Multimodal LLM framework to process large-scale unstructured text and satellite imagery; built a pipeline to detect ESG risks in the EV supply chain, providing actionable insights for revenue loss mitigation. ● Streamlined data ingestion from public records using Python APIs and Scikit-learn to curate training datasets for LLM-based ESG reporting; reduced data preparation time by 40% through automated cleaning and standardization.
- Graduate Teaching Assistant
  Aug 2024 - Dec 2024 · 5 mos
  MIS 331 Database Management Systems ● Facilitated weekly technical labs focusing on Advanced SQL and Database Design (ERD); provided hands-on guidance for 100+ students in solving complex query optimization and schema normalization problems. ● Demonstrated technical leadership by bridging the gap between theoretical database concepts and practical implementation, maintaining high teaching standards in a high-pressure, English-speaking academic environment.
Data Analyst Intern at TSMC
Jun 2024 - Aug 2024 · 3 mos
● Automated competitive data collection by designing robust Python-based scraping pipelines with Selenium and Scrapy; processed 1.5M+ records weekly, reducing manual effort by 80% and enhancing strategic planning accuracy. ● Refined raw data structures by implementing custom cleaning and normalization logic to ensure high data fidelity for competitive intelligence analysis. ● Optimized text preprocessing workflows for 10K+ news articles using NLTK; automated tokenization and labeling processes, which halved manual labeling time and improved market analysis efficiency.
Business Intelligence Analyst Intern at DBS Bank
Jul 2022 - Aug 2022 · 2 mos
● Developed Power BI dashboards using SQL and regression models (Statsmodels) to analyze customer feedback, identifying key service drivers and improving satisfaction scores by 10%. ● Automated data pipelines using Python and Excel VBA for cleaning and aggregating interaction data, reducing manual errors by 30% and significantly driving digital banking adoption rates.
Research Assistant at Market Intelligence & Consulting Institute (MIC)
Jan 2022 - Apr 2022 · 4 mos
Authored bi-weekly digital transformation reports for government agencies, providing strategic business insights across 5+ industries.