Radhika Gaonkar

Applied Scientist @ Microsoft | LLM Post-Training, Agentic Evaluation, Reasoning Models | M365 Copilot

San Francisco, California, United States

About

I am a Tech Lead Applied Scientist at Microsoft working on LLM post-training, reasoning models, agentic evaluation, and M365 Copilot. My current work focuses on RLVR, on-policy distillation, synthetic data generation, domain adaptation, and long-horizon agents. I have led post-training and evaluation work for Copilot systems deployed at enterprise scale, including reasoning model adaptation, solver-verified multi-turn agents, Agent-as-a-Judge evaluation, domain expert SLMs, and personalization/memory systems. Recent work: reduced chain-of-thought tokens by ~85% while preserving hard-task reasoning quality, led RLVR post-training for a multi-turn enterprise optimization agent, built agent-loop-based evaluators for Copilot RL pipelines, and scaled synthetic data 100x for domain expert models. I am interested in post-training, RL for agents, evals as training signal, synthetic data, and building reliable reasoning systems in production. I write technical deep-dives on LLM training and evaluation on Substack: substack.com/@gaonkarradhika For a full list of my publications, please visit my Google Scholar page - https://scholar.google.com/citations?user=or203g8AAAAJ&hl=en

Experience

  • Applied Scientist / Tech Lead at Microsoft
    Feb 2020 - Present · 6 yrs 5 mos

    Applied Scientist Tech Lead on Microsoft 365 Copilot Research, building LLM post-training systems for reasoning models, agents, personalization, and enterprise AI. Current work: leading post-training for reasoning models and multi-turn agents in M365 Copilot. Built RLVR and on-policy distillation pipelines that teach models when to reason, reducing chain-of-thought tokens by ~85% and latency by ~50% while preserving hard-task reasoning. Post-training lead for Copilot Tuning optimization agents that solve enterprise planning tasks using solver-verified rewards, synthetic data, and LLM-as-a-Judge evaluation. Led agentic evaluation work for Copilot RL pipelines, replacing static LLM-as-a-Judge scoring with tool-grounded agent evaluators. Previously led M365 Copilot personalization and domain adaptation through synthetic data, mid-training, SFT, DPO, and sparse-evidence style alignment. Designed entity-relationship synthetic data generation for domain expert SLMs, scaling data 100x without labeled data and improving enterprise knowledge QA. Shipped as domain expert sub-agents in Microsoft Researcher. Earlier M365 work: built a cascading LLM router with confidence calibration for groundedness evaluation, reducing cost by ~60% and latency by ~27%; shipped transformer-based user embeddings for SharePoint recommendations; led LoRA fine-tuning, pruning, compression, and diversity improvements for Smart Replies across Outlook and Teams. Selected public work attached below: Copilot Tuning, Microsoft Researcher, invited talks at WomenWhoCode Summit 2026 and Agentic AI Summit 2026, M365 Copilot, TMLR 2025, ICLR 2022, and ACL 2020.

  • Stony Brook University (1 yr 8 mos)
    • Graduate Teaching Assistant - Natural Language Processing CSE538-01
      Aug 2019 - Dec 2019 · 5 mos

    • Graduate Research Assistant
      May 2018 - Dec 2019 · 1 yr 8 mos

      Graduate research in NLP and narrative understanding, focused on modeling how events in stories trigger emotional reactions in characters. My thesis work treated emotion prediction not as independent binary classification, but as a structured multi-label problem where emotion labels carry semantic meaning and correlations. Developed models that incorporated label embeddings, label-aware attention, and label-label correlation constraints into BERT-based emotion inference over ROCStories. Introduced a semi-supervised learning strategy that used unlabeled story data to regularize predictions based on emotion-label relationships. This work achieved a new state-of-the-art result on the emotion reaction prediction task and was published at ACL 2020. Paper: Modeling Label Semantics for Predicting Emotional Reactions https://aclanthology.org/2020.acl-main.426/

    • Graduate Teaching Assistant - Data Mining
      Jan 2019 - May 2019 · 5 mos

      Graded assignments, exams and final project for graduate Computer Science students. This course covers a breadth of topics in Data Mining and Machine Learning which included - Data Preprocessing, Classification algorithms such as Decision Trees, Deep Learning, Clustering algorithms, and Genetic algorithm. The work also included holding TA hours for solving doubts of students in the above-mentioned topics, as part of the assignments, projects, and exams.

  • Applied Scientist Intern at Microsoft
    Jun 2019 - Aug 2019 · 3 mos

    I was an Applied Scientist intern in the Smart Reply team. Here I worked on quantifying the diversity in the suggested replies in the smart replies pipeline. This involved experimenting with different classification models for intents of responses and lexical diversity measures to analyze the diversity of the replies along different axes of semantic and lexical diversity.

  • Machine Learning Scientist at Fractal
    May 2016 - Dec 2017 · 1 yr 8 mos

    Built and deployed applied ML systems for enterprise clients across NLP and computer vision, translating ambiguous business problems into trainable modeling tasks and production-ready AI solutions. Worked on sentiment analysis, topic modeling, OCR, and object detection systems for Fortune 500 clients across financial services, consumer goods, insurance, and technology. Led model selection, feature engineering, training, evaluation, and deployment workflows, with emphasis on adapting ML methods to noisy, domain-specific enterprise data. This role gave me early experience in the full applied ML research loop: problem formulation, data quality analysis, model experimentation, error analysis, stakeholder-facing evaluation, and production deployment across multiple domains.

  • Technische Universität Darmstadt (Darmstadt, Hesse, Germany)
    • Research Assistant - Ubiquitous Knowledge Processing Lab
      Jun 2014 - Apr 2016 · 1 yr 11 mos

      I worked as a research assistant at the UKP lab at TU Darmstadt - https://www.informatik.tu-darmstadt.de/ukp/ukp_home/index.en.jsp

    • Research Internship
      Jan 2014 - Jun 2014 · 6 mos

      Developed a reinforcement learning based travel recommender system for predicting the next best places for tourists to visit. This end-to-end system was built using various open data sources from the web and publicly available user photograph data from the Flickr API. This work was published at the International Symposium on Intelligent Data Analysis.