France
Democratizing good Machine Learning at Hugging Face 🤗 (open source + dataset platform) Previously ML Engineer at Feedly. Master of Engineering at CentraleSupélec, MVA Master at ENS Paris-Saclay, exchange at EPFL. #AI #Tech #MachineLearning #NLP #DataProcessing
Head of datasets (Hugging Face Hub + open source) - Back-end implementation of the Dataset Viewer: data pipelines and APIs that support 500k datasets for text, audio, image, video... - Integration with data tools: DuckDB, PySpark, Polars, Pandas, Dask, WebDataset... - Open Source Development & Maintenance of the Hugging Face Datasets library: data loading, processing, streaming - Leading the Datasets team on the open source and platform projects - Help with additions to the Parquet ecosystem: content-defined-chunking and deduplication - Support of open source efforts in training, evaluation and robotics libraries - Auto-tagging of datasets: size, format, modality, compatible libraries - Work on synthetic data applications in Hugging Face Spaces: generate, modify, filter datasets - Collaborations with research labs and entities on datasets publication: AllenAI, Cohere for AI, LAION, Wikimedia, Mozilla, Apple, NASA, ESA... - Support for dataset preparations for LLM pretraining, as well as for computer vision and audio speech processing
- Implementation of the first version of the Hugging Face Datasets library - Addition of Deep Learning models to the Transformers library: DPR (Dense Passage Retrieval), RAG (Retrieval Augmented Generation) in collaboration with Facebook AI Research - Dense Retrieval Optimization (HSNW, IVF, quantization) on Open Domain Question Answering tasks
- Implementation of Named Entity Recognition + Linking algorithms for company names detection and disambiguation in news articles. - Work on financial articles classification, to allow users to filter financial/stock/market updates in their news feed
- Development of a Machine Learning framework to easily move ML experiments to production - Topic classification of news articles (800 topics) at scale (10M+ users, 40M+ sources)
Worked for @ParisDigitalLab partners at Innovation Factory Paris on three innovative projects: - Prototype of BCG's Data & Digital Platform : Full Stack development of a web platform that uses Big Data technologies for The Boston Consulting Group. - Data Science for RATP : Use of Machine/Deep Learning techniques to leverage daily metro data. - Mobile dev for Randstad : Prototyping of iOS applications using Augmented and Virtual Reality.
Forum CentraleSupélec creates the opportunity for 3000 students, doctoral candidates, and alumni to connect together with 1000 representatives of 200 global companies @ Palais des Congrès Paris