Post by Ritikesh Choube

Senior AI Engineer | LangGraph · CrewAI · Google ADK · MCP | Multi-Agent Systems & RAG in Production | Tech Lead

In my last 3.6 years building AI systems, the most common question I've heard is, "How do we get more data?" But lately, I've been focused on a different question: "What kind of data are we scaling?" Yann LeCun recently highlighted a startling comparison: The training data for a massive LLM is roughly 2 times 10^13 bytes. A toddler processes 1 times 10^15 bytes of sensory info before they even start school. The takeaway? Text is a "thin" medium. We are training models on the description of the world, while humans learn from the experience of the world. As we design the next generation of AI systems, the challenge isn't just increasing the token count - it's increasing the SENSORY BANDWIDTH. Until a model can "see" a ball roll and understand physics without being told, we are still just predicting the next word, not understanding the next moment. #Ai #Agents #SuperIntelligence #Data