Quentin Lhoest

Open Source & ML/Data Engineer at Hugging Face

France

About

Democratizing good Machine Learning at Hugging Face 🤗 (open source + dataset platform) Previously ML Engineer at Feedly. Master of Engineering at CentraleSupélec, MVA Master at ENS Paris-Saclay, exchange at EPFL. #AI #Tech #MachineLearning #NLP #DataProcessing

Experience

  • Hugging Face (Paris, Île-de-France, France)
    • Open Source & ML/Data Engineer
      Oct 2020 - Present · 5 yrs 9 mos

      Head of datasets (Hugging Face Hub + open source) - Back-end implementation of the Dataset Viewer: data pipelines and APIs that support 500k datasets for text, audio, image, video... - Integration with data tools: DuckDB, PySpark, Polars, Pandas, Dask, WebDataset... - Open Source Development & Maintenance of the Hugging Face Datasets library: data loading, processing, streaming - Leading the Datasets team on the open source and platform projects - Help with additions to the Parquet ecosystem: content-defined-chunking and deduplication - Support of open source efforts in training, evaluation and robotics libraries - Auto-tagging of datasets: size, format, modality, compatible libraries - Work on synthetic data applications in Hugging Face Spaces: generate, modify, filter datasets - Collaborations with research labs and entities on datasets publication: AllenAI, Cohere for AI, LAION, Wikimedia, Mozilla, Apple, NASA, ESA... - Support for dataset preparations for LLM pretraining, as well as for computer vision and audio speech processing

    • Machine Learning Intern
      Apr 2020 - Oct 2020 · 7 mos

      - Implementation of the first version of the Hugging Face Datasets library - Addition of Deep Learning models to the Transformers library: DPR (Dense Passage Retrieval), RAG (Retrieval Augmented Generation) in collaboration with Facebook AI Research - Dense Retrieval Optimization (HSNW, IVF, quantization) on Open Domain Question Answering tasks

  • Feedly (San Francisco Bay Area)
    • Part Time Machine Learning Engineer
      Feb 2019 - Sep 2019 · 8 mos

      - Implementation of Named Entity Recognition + Linking algorithms for company names detection and disambiguation in news articles. - Work on financial articles classification, to allow users to filter financial/stock/market updates in their news feed

    • Machine Learning Intern
      Aug 2018 - Jan 2019 · 6 mos

      - Development of a Machine Learning framework to easily move ML experiments to production - Topic classification of news articles (800 topics) at scale (10M+ users, 40M+ sources)

  • Full Stack Software Engineer at Paris Digital Lab
    Jan 2018 - Jul 2018 · 7 mos

    Worked for @ParisDigitalLab partners at Innovation Factory Paris on three innovative projects: - Prototype of BCG's Data & Digital Platform : Full Stack development of a web platform that uses Big Data technologies for The Boston Consulting Group. - Data Science for RATP : Use of Machine/Deep Learning techniques to leverage daily metro data. - Mobile dev for Randstad : Prototyping of iOS applications using Augmented and Virtual Reality.

  • Communication Manager at Forum CentraleSupélec
    Jan 2017 - Jan 2018 · 1 yr 1 mo

    Forum CentraleSupélec creates the opportunity for 3000 students, doctoral candidates, and alumni to connect together with 1000 representatives of 200 global companies @ Palais des Congrès Paris