Post by Sidhant Kabra
Co-Founder at Cekura - Conversational AI QA | Y Combinator | IITB
Anthropic's Claude Fable 5 is out, and my feed already has the verdict: agents are reliable enough now to run unsupervised. For voice, that's backwards. The whole frontier is moving one direction. Fable 5 is built for long-horizon agent work, the kind that runs for hours without a human checking in. The GPT-5.6 traces developers found in OpenAI's Codex logs point to more context and more autonomy. Google DeepMind's Gemini 3.5 Flash brings frontier reasoning down to real-time latency and cost. Smarter, more independent, less human in the loop. More reasoning genuinely helps voice agents, which spend most of a call reacting to whatever the caller does instead of following a script. And because Flash delivers it at the latency and price a live call can actually afford, it isn't a someday upgrade. It's landing in production now, through the orchestration layer that Vapi, LiveKit, Pipecat, ElevenLabs and Retell already ship. But the same upgrade has a second effect. A smarter model is a more autonomous one, trusted to act on its own. When a coding agent gets something wrong, you catch it at review before anything ships. When a voice agent gets something wrong, it has already said it to a customer. There is no review step on a phone call. So a better model doesn't only make your agent smarter. It makes the mistakes it still makes arrive faster, sound more convincing, and reach the customer with no one in between. The only way to stay ahead of that is to know how the agent behaves on the calls you never scripted, before it takes a real one. That is the gap we built Cekura to close. A new model doesn't make a voice agent safe. It makes a good one better and a broken one worse, faster. Worth knowing which one you have before it runs a thousand live calls.