Aaron Wu

Data Engineer at Coherent Path

Toronto, Ontario, Canada

About

Experience

  • Senior Data Engineer at Narvar
    May 2026 - Present · 2 mos

  • Analytics Engineer at RXNT
    Apr 2025 - May 2026 · 1 yr 2 mos

    ELTL, hevo + dbt, data modeling

  • Data Engineer at Movable Ink
    Feb 2022 - Apr 2025 · 3 yrs 3 mos

    Build data pipelines on GCP

  • Data Engineer at Coherent Path
    Feb 2021 - Feb 2022 · 1 yr 1 mo

    • Collect data to SFTP site and use replicator rules to relocate the raw data • Processed data with stable Spark pipelines to filtering, cleaning, joining related files together, and aggregating on different levels and save files in cache for later usage. • Load processed data to different end users such as ML model, reporting, and analytics • Using Python libraries such as NumPy and Math and some self-defined (lambda) functions to generate mathematical distribution of data • Design and execute A/B tests that aims for a 5% conversion lift. Report A/B tests to clients to provide recommendations and guidelines. • Connect to Cloud Engine (AWS and GCP) and build Spark pipeline that is reliable and rerunnable. • Use Jenkins and airflow to automate pipeline jobs

  • Beam Data (Toronto, Ontario, Canada)
    • Data Science Consultant
      Jan 2020 - Feb 2021 · 1 yr 2 mos

      Wearable Optimizations Build an interactive Tableau dashboard to understand sales of Samsung wearable products and build a dynamic model to increase the sell-out and keep Samsung competitive in the wearable marketing • Used Tableau to capture the big picture of overall wearable devices selling performance by region, products, time and customer purchasing behavior for the past 5 years • Clustered customers based on customer values into 3 classes to identify potential wearables buyers, providing business insights to the marketing team for promotion decisions • Analyzed the life cycle of the products and the past selling trends to predict the future sellouts and help on saving promoting budgets • Enhanced the understanding of sellouts distribution by identifying the top-selling locations on FSA level • Achieved better understanding of customer base with demographic information (age, gender, and ethnicity) to provide unique promotion strategies

    • Data Science Consultant
      Jun 2020 - Jul 2020 · 2 mos

      Beam Data | Client: WeCloudData | Data Science Consultant Learning Experience Optimizations Build an end-to-end data pipeline that implements data on Stack Overflow to return relevant posts based on the student’s questions to improve their learning experience • Collected question, answer text and its labels (python, SQL, etc.) from Stack Overflow • Applied regular expression to clean text and use Sklearn multiclassification algorithm to predict the labels • Created explicit sub-labels of the text with unsupervised topic modeling in LDA (Latent Dirichlet Allocation) to better categorize the text

    • Data Scientist
      Mar 2020 - Jul 2020 · 5 mos

      Customer Propensity Model Build complex customer propensity ML model to understand Samsung customer purchasing and upgrading behavior • Wrote SQL queries to extract purchase history, geolocation and price promotion data from SQL Server • Reduced the data imbalance rate about 5% by identifying the target customer instead of using raw data • Segmented customers who are more likely to upgrade their device in the next 3 months with analytics focused on marketing campaign optimization • Implemented tree-based classification models (Random Forest and XGBoost) to build the baseline model and use precision and recall to evaluate the model performance • Engineered new features and used correlation map, Graphviz to determine the key features that have the most impact on customer’s upgrading decision • Reported project progress to the Data Science team manager and documented feedbacks and changes on JIRA and Confluence weekly • Found potential upgrading users by deploying the model and narrowed down the number by 49% with different filters to cooperate with the budget