United States
Worked in foundations research for search with a focus on language model post-training and architecture, with some work in data, pre-training, evaluation, and retrieval.
Alignment of language models using reinforcement learning for downstream and auxiliary tasks (e.g., recommendations), fun exploration within generative retrieval (of reinforcement and meta-learning, slate-level ranking, outcome-conditioning), and leveraging heterogeneous graphs for unified representation learning. – Developed unified LLM alignment approach, leveraging both reinforcement learning and direct preference optimization, with improved stability and efficiency over RL and 10% improvement in performance over DPO [1] (accepted at TMLR) – Implemented PinRec, an efficient outcome-conditioned, multi-token generative retrieval technique, with over +0.5% online sitewide time spent and +0.3% sitewide fulfilled sessions [2] (accepted at KDD ADS 2026) – Trained and integrated multi-task node representations of heterogeneous graphs with 60B edges, OmniSage, leveraging improved feature-level retrieval and user sequence objectives, lifting online sitewide repins by over 2.5-3% [3] (accepted at KDD ADS 2025) [1] Anirudhan Badrinath, Prabhat Agarwal, and Jiajing Xu. Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier. arXiv preprint arXiv:2405.17956, 2024. [2] Anirudhan Badrinath, Prabhat Agarwal, Laksh Bhasin, Jaewon Yang, Jiajing Xu, and Charles Rosenberg. "PinRec: Outcome-Conditioned, Multi-Token Generative Retrieval for Industry-Scale Recommendation Systems." arXiv preprint arXiv:2504.10507 (2025). [3] Anirudhan Badrinath, Alex Yang, Kousik Rajesh, Prabhat Agarwal, Jaewon Yang, Haoyu Chen, Jiajing Xu, and Charles Rosenberg. "OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning." arXiv preprint arXiv:2504.17811 (2025).
Performed research under Prof. Emma Brunskill and Chris Piech in the SAIL Lab @ Stanford within the intersection of offline reinforcement learning (RL) and applications to education. – Developed variant of decision transformer using intermediate waypoint generation with outperforming state-of-the-art RL methods (IQL, CQL), often by 30-80%; presented at NeurIPS 2023 [1] and Goal Conditioned RL Workshop (NeurIPS 2023) – Evaluated reinforcement learning tasks offline using an ensemble-based off-policy evaluation method, OPERA, accepted to NeurIPS 2024 [2] – Developed RL technique with Prof. Chris Piech for chess puzzle recommendation on chess.com 2 with 1 billion interactions; outperforms production in off-policy and LLM evaluation, accepted at RLC 2026 and RLC Journal [3] – Developed a preliminary system leveraging assessments of cognitive mastery via knowledge tracing, educational indicators, and LLMs to assist Carnegie Learning tutors in prioritizing student support [1]: Anirudhan Badrinath, Yannis Flet-Berliac, Allen Nie, and Emma Brunskill. Waypoint transformer: reinforcement learning via supervised learning with intermediate targets. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NeurIPS ’23, Red Hook, NY, USA, 2024. Curran Associates Inc. [2]: Allen Nie, Yash Chandak, Christina Yuan, Anirudhan Badrinath, Yannis Flet-Berliac, and Emma Brunskill. OPERA: Offline policy evaluation with re-weighted aggregates of multiple estimators. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NeurIPS ’24, 2024. [3]: Anirudhan Badrinath*, Allen Nie*, Nicholas Tomlin, Timothy Dai, Carissa Yip, Rose E Wang, Emma Brunskill, and Christopher J Piech. Discovering high-quality chess puzzles through one billion plays with offline reinforcement learning. In Reinforcement Learning Conference, RLC ’26, 2026. To be published in Reinforcement Learning Journal.
Implemented end-to-end SSL/TLS certificate tracking framework on EC2 instances, with fully integrated support on AWS Console; migrated AWS OpsInsights to Athena backend with > 50x cost reduction.