Michelle Lin

Senior Data Engineer at Yahoo

Boston, Massachusetts, United States

About

I’m a Senior Data Engineer with 7+ years of experience building scalable, reliable data systems that drive measurable business impact. At Wayfair, I built and scaled production-grade pipelines supporting homepage, search, and product page experiences, which contributing to $100M+ in annual revenue impact. I led logging architecture redesigns that reduced infrastructure costs by $300K annually, improved SLA adherence from 60% to 100%, and developed analytics frameworks used widely across engineering, product, and data science. Currently at Yahoo, I’m focused on modernizing data infrastructure, leading on-prem to GCP migration and GA4 adoption initiatives. My work centers on transitioning legacy systems to cloud-native architectures while maintaining data integrity, reliability, and analytics continuity. Core areas of focus: • Data architecture & scalable ETL design • Cloud migration & GCP modernization • BigQuery & distributed data systems • Analytics engineering (LookML, KPI standardization, GA4) • Data quality, observability & cost optimization • Experimentation & metrics infrastructure I enjoy operating at the intersection of engineering and analytics, building platforms that empower teams to move faster with trusted data. I’m also open to select project-based advisory engagements around cloud migration, data architecture, pipeline optimization, and analytics enablement.

Experience

  • Senior Data Engineer at Yahoo
    Nov 2025 - Present · 8 mos

  • Wayfair (Full-time · 7 yrs 5 mos)
    • Senior Data Engineer
      Nov 2021 - Nov 2025 · 4 yrs 1 mo

    • Manager / Tech Lead
      Aug 2021 - Nov 2021 · 4 mos

    • Senior Data Scientist
      Aug 2019 - Jul 2021 · 2 yrs

  • Data Science / AI Co-op at Fidelity Investments
    Jan 2018 - Jun 2018 · 6 mos

    • Partnered with business stakeholders to identify top AI opportunities, transformed large volumes of data into AI-driven solutions using natural language processing (NLP) techniques to deliver a project driving significant value to Fidelity • Developed a supervised deep learning model that outperforms the state-of-the-art methods using Python Keras on GPU • Extracted data (50 million rows) from Hive, preprocessed and analyzed data on PySpark • Identified business insights by performing analyses on text data using keywords extraction (Text Rank and RAKE) • Communicated effectively with manager and teammates, collaborated on GitLab

  • Data Scientist Intern (Graduate Qualifying Project) at OSRAM
    Sep 2017 - Dec 2017 · 4 mos

    • Collaborated with Xi Liu, Yalei Peng, and Congyang Wang in adding value with machine learning in the building Internet of Things (IoT). • Predicted unreservable space usage status (occupied or unoccupied) via classification models, such as logistic regression, decision tree, random forest, and support vector machine. • Improved accuracy of predictive statistical models using ensemble models by around 20%. • Forecasted unreservable space usage duration using long short term memory network (LSTM) which outperforms time series models by improving 35% accuracy.

  • Research Assistant at Worcester Polytechnic Institute
    Oct 2016 - Dec 2017 · 1 yr 3 mos

    • Worked with Prof. Renata Konrad, Prof. Andrew C. Trapp, and Kayse Lee Maass on using data science to fight human trafficking. • Built prioritization framework to categorize states based on the prevalence of human trafficking, the legislative environment regarding human trafficking, and the current number of dedicated human trafficking shelters per million residents within the state. • Visualized human trafficking statistics via python (plotly and matplotlib) and R.