Taiwan
Data Scientist with business background and a track record of solving diverse problems across public safety, healthcare, and finance. Adept at building scalable data pipelines, training machine learning models, and delivering actionable insights. Quick to adapt to new domains and collaborate with stakeholders to turn data into impact.
Built and delivered a functional product prototype within two months, presenting it to funders and securing critical ongoing support and funding approval. • Enabled rapid data preparation for bounding box labeling, reducing data preprocessing time from 50+ hours to under 30 seconds, resulting in a 6,000 times increase in productivity by building a semi-automated ETL pipeline using Python. • Improved the accuracy of two YOLOv8 object detection models for the AI surveillance system to capture potential crime characteristics, achieving Precision: 92.4% and Recall: 90.1, by redefining detected classes to address model weaknesses while ensuring alignment with stakeholder requirements.
Optimized the COVID-19 classification model. • Implemented a COVID-19 coughing classification model on the Google Cloud Platform by converting audio data into spectrograms and training through a Convolutional Neural Network (CNN). • Increased model accuracy from 63% to 90% by identifying key features through exploratory data analysis, feature selection, and medical research insights, using Python libraries (Matplotlib and Seaborn)
Led a team of three researchers in analyzing the impact of public sentiment on the stock price. • Saved $35,000 annually on a Twitter API premium subscription and collected over 1,000,000 tweets by building a daily automated data-collection pipeline using Python and the standard Twitter API. • Boosted sentiment analysis accuracy from 53% to 91% by designing and implementing preprocessing steps for tweets using natural language processing (NLP) and regular expressions and validating the improvements through an A/B experiment. • Identified 2 key trading factors significantly influenced by public sentiment through a Python-based time series regression analysis and presented the research at the 2021 INFORMS Annual Meeting.
Information Security Management Course • Instructed a hands-on lesson for 47 students on the implementation of E-mail Security Solution (Actalis SSL certificates) and Self-signed SSL Certificate for Apache in Windows & Ubuntu • Designed teaching material; tutor students based on personal experience of learning the subject
Developed an emergency response platform for coordinating earthquake relief and delivering routine medical services. • Integrated medical and geographical data from 143 hospitals and 205 rescue units into a MySQL database by building an automated ETL pipeline using Selenium, Beautiful Soup, and the Google Maps API. • Developed a web platform with interactive data dashboards for stakeholders to track historical emergency response events and geographical data to optimize rescue strategies and routes, using Tableau, SQL, PHP, and HTML. • Collaborated with stakeholders to develop strategies for emergency resource allocation and road safety enhancements, identifying high-accident locations on street maps through analysis of 770,000 spatial data points on the platform.