Post by Ritikesh Choube

Senior AI Engineer | LangGraph · CrewAI · Google ADK · MCP | Multi-Agent Systems & RAG in Production | Tech Lead

The RAG Interview Question That 70% of Candidates Are Failing dont know why.... I’ve been conducting a lot of GenAI interviews recently. The pattern is fascinating. Most candidates are well-versed in advanced architectures. They can talk for hours about Multi-RAG, Agentic Workflows, and GraphRAG. They have "built big systems." But then I ask one fundamental question: "When a user asks a query in your RAG pipeline, it gets converted into an embedding. Which model creates that embedding?" 7 out of 10 candidates confidently answer: "The LLM." Wait... what? If you think the LLM (the generation model) is creating the query vector, let’s look at the math. The Database: Your vector store was indexed using an Embedding Model (e.g., text-embedding-3, dim=1536). The Query: If you use the LLM (e.g., GPT-4) to "embed" the query, you are extracting hidden states (dim = 4096+). The Result? You cannot perform a Dot Product between a 1x1536 vector and a 1x4096 vector. The code doesn't just give you a bad answer; it creates a Dimensionality Mismatch Error. The math literally breaks. You cannot zip a jacket if one side has 1,000 teeth and the other has 4,000. I created this slide deck using (notebook LLM) to explain exactly why the Encoder (Embedding Model) and Decoder (LLM) are not interchangeable - mathematically or architecturally. #RAG #GenerativeAI #LLM #SystemDesign #MachineLearning #LangChain #AIInterviews