Mary Newhauser

Member of Technical Staff @ Fastino Labs

Berlin, Berlin, Germany

About

Building the future of SLMs at Fastino Labs. Interested in multi-task encoder models (e.g. GLiNER, GLiNER2) and small decoder models with emergent capabilities that are pushing the Pareto Frontier. Also interested in the clever optimizations and fine-tuning techniques that get us small language models in the first place (e.g. knowledge distillation, LoRA/QLoRA, quantization, SFT, and RL). I think the future is small.

Experience

  • Member of Technical Staff at Fastino Labs
    Jan 2026 - Present · 6 mos

  • Machine Learning Engineer at Weaviate
    Sep 2024 - Sep 2025 · 1 yr 1 mo

    • Specialize in LLM fine-tuning for embedding models and generative models • Write notebooks integrating Weaviate with other open source tools • Implement agentic RAG, advanced RAG, and vanilla RAG across multiple domains • Fine-tune Gemma3 on RunPod with 🦥 Unsloth • Create agentic workflow templates for n8n leveraging vector stores • Write blog posts consistently ranking on page 1 of Google search results for their target keywords • Create viral, high-engagement social media content

  • Wiley-VCH (5 yrs 8 mos)
    • Senior Data Scientist
      May 2021 - Aug 2024 · 3 yrs 4 mos

      • Leverage weak supervision and few-shot classification for training extreme multi-label classifiers • Generate and cluster SPECTER embeddings to identify emerging trends in scientific research • Maintain and build NLP models to match scientific content to Wiley journals • Leverage spaCy and FastAPI technology to build text-tagging APIs • Mentor and train data analysts and emerging data scientists

    • Data Scientist
      Jan 2019 - May 2021 · 2 yrs 5 mos

      • Generate and leverage SPECTER (BERT) embeddings using Transformers and PyTorch • Build hundreds of highly accurate journal classification models based on title/abstract text • Entity resolution and record deduplication using active learning

  • Behavioral Science Analyst at Mattersight Corporation
    Jan 2015 - Feb 2017 · 2 yrs 2 mos

    • Employed Natural Language Processing techniques to analyze transcripts of thousands of recorded customer service and sales call transcripts • Used supervised and unsupervised machine learning techniques in R and Python (i.e. Naïve Bayes Classifier, LDA, k-means clustering models) to create linguistic rules based on regular expressions to perform content analysis • Conceptualized, tested, and refined definitions of linguistic metrics based on quality of available data and client needs • Used MS SQL Server and R (reshape2, plyr packages) in tandem for data wrangling • Leveraged linguistic data and customer data to create logistic regression and CART models in R and Python to predict customer behaviour and make recommendations to improve caller experience • Presented advanced statistical concepts and actionable insights to commercial and tech audiences alike • Worked closely with our BI team to reveal hidden patterns in data to improve agent efficiency and customer satisfaction • Led deployments from start to finish by communicating goals, deadlines, and project scope to teammates

  • Research assistant at Northern Illinois University
    Jun 2011 - Aug 2011 · 3 mos