San Jose, California, United States
The title “data scientist” is changing under our feet. When AI can write the model, the thinking becomes the job — and the people who thrive won’t be the best coders, but the best orchestrators of AI agents. I am a Data Scientist at Google, specializing in the architecture, evaluation, and automated guardrails of next-generation AI systems. My work focuses on building scalable evaluation infrastructure—leveraging LLM-as-a-judge (Autoraters), human-in-the-loop (HITL) frameworks, and agentic workflows to measure and optimize ambient intelligence and voice experiences. Before tackling generative AI quality at Google, I spent years navigating data velocity and algorithmic complexity at scale: • At Walmart: Scaled the Walmart Cash program from <$1M to $12M/week (12x program turnover) and consulted on the Walmart Plus subscription model. • At Twitter: Led product analytics and causal inference for user onboarding, activation, and retention. • Causal Inference & Graph Analytics: Deep background applying causal inference to massive networks and engineering geospatial graph analytics (powering a million households to transition to solar energy). I also write "AI-First Data Scientist"—a weekly newsletter and upcoming book helping data scientists, engineers, and product builders navigate this transition. If you are building in this new AI-first world or trying to make sense of agentic orchestration and evaluation, let's connect. 👉 Subscribe to the newsletter: www.aifirstdatascientist.com
Lead data science methodologies for Generative AI evaluation and quality for Gemini on Home devices. • Design and deploy scalable automated evaluation systems (Autoraters / LLM-as-a-judge) to measure conversation quality, voice experiences, and ambient intelligence. • Build human-in-the-loop (HITL) frameworks and A/B testing methodologies to provide rigorous statistical validity to model feedback and automated guardrails. • Partner with product and engineering teams to measure and optimize Gemini's trust, safety, and core alignment.
Led experimentation and causal inference for Walmart Cash, scaling the program to all customers and driving a 12x increase in weekly program turnover ($1M to $12M/week). • Advised product and business leadership during critical reviews of the Walmart Plus subscription ecosystem to drive loyalty and activation. • Developed statistical frameworks to measure and optimize customer financial product engagement at scale.
Directed causal inference and product analytics for user onboarding, activation, and long-term retention loops. • Built statistical models and experimentation frameworks to identify downstream user value and activation velocity. • Mentored and scaled the data science team, hosting internal technical training on causal inference and product analytics.
Led Consumer Analytics Division focused on Retention, Engagement, and LTV Built tools estimating the marginal contribution of user action on their renewal
• Led multiple Data Science projects where I defined the dynamics of a two-sided marketplace (home buyers vs. agents) through metrics • Designed a search experience where users can search homes by commute address & time