Saul Ramirez, Ph.D.

Head of Research @ Subquadratic | Next Gen LLMs

Salt Lake City Metropolitan Area

About

Hi, I'm Saul (sah-OOL). I lead Language Modeling Research at Subquadratic, where we build systems that scale beyond the limits of conventional Transformer architectures. My work focuses on sequence modeling, long-context reasoning, and the training systems required to make them practical. I'm particularly interested in how architecture, data, and experimentation interact to produce capabilities that emerge only at large context lengths and long horizons. Before working in AI, I came from hydrology, where I spent years modeling complex temporal systems and long-range dependencies. That background continues to shape how I think about machine learning: as a sequence modeling problem first and a software problem second. Over the course of my career I've worked across technology, climate science, healthcare, and entertainment, building teams and systems that bridge research and production. I care less about whether an idea is novel than whether it survives contact with reality. I write and speak about sequence modeling, long-context architectures, training dynamics, research leadership, and building organizations that can both discover and ship. → Feel free to connect if you're interested in long-context AI, scalable sequence modeling, or the future of language and speech systems.

Experience

Subquadratic (San Francisco, CA · Hybrid)
- Head of Research
  May 2025 - Present · 1 yr 2 mos
  - Leading technical investigations into alternative transformer architectures to optimize scaling laws, specifically focusing on Sparse Attention, Linear Attention, and State Space Models (SSMs). - Architecting novel pre-training and fine-tuning strategies for decoder-only architectures to ensure high-fidelity performance across modalities. - Directing the technical vision for Speech (STT, TTS, Speech-to-Speech) and LLM teams, authoring foundational white papers that translate architectural breakthroughs into investor-ready technical narratives.
- Founding Engineer
  Jan 2025 - Apr 2025 · 4 mos
  - Built core ML infrastructure and conversational AI systems - Solved character drift problem—enabled 45+ minute coherent conversations through novel data generation techniques - Built a custom training infrastructure from scratch supporting QLoRA, full fine-tuning, and curated synthetic data generation pipelines. - Trained and aligned conversational models ranging from 1B to 120B parameters, managing the full lifecycle from data curation to production deployment. - Implemented scalable inference systems utilizing vLLM and SGLang to serve complex conversational AI applications in real-time.
Data Scientist at Experian
Sep 2024 - Jan 2025 · 5 mos
- Rebuilt and refreshed production models for Social Determinants of Health (SDoH), predicting patient medication adherence and physical mobility (e.g., 500m walking ability). - Designed and executed a cloud migration roadmap for 70+ production models, establishing automated monitoring and retraining loops to replace legacy on-prem systems.
Machine Learning Engineer at BENlabs
Jan 2023 - Jul 2024 · 1 yr 7 mos
- Built recommendation system for product placement in high-profile film projects, including the Barbie movie, contributing to a 7% sales lift and an 80% reduction in agency research time. - Pioneered the use of transformer architectures for continuous value prediction (age/demographics), creating a recommendation engine that matched products with film contexts. - Built production RAG system and LLM-powered analytics serving 5,000+ YouTube creators, processing 1M+ videos for content insights and trend analysis - Developed early agantic AI system for web scraping, search, and summarization for social media tracking
Research Scientist at Brigham Young University
Aug 2020 - Apr 2023 · 2 yrs 9 mos
- Developed groundwater monitoring models integrated into the U.S. National Water Model, providing early flood warnings through coupled groundwater-surface water simulations. - Created "Controlled Leakage" and "Iterative Refinement" for sparse spatio-temporal datasets, achieving 25%–40% accuracy gains and publishing 5 peer-reviewed papers
Data Engineer at Amazon
May 2022 - Jul 2022 · 3 mos
- Architected Spark-SQL pipelines that processed 30 TB of data in 6 hours, reducing legacy runtime by 75%.