Post by MongoDB

921,740 followers

Building a RAG app? The embedding model you choose can make or break your retrieval quality. Most developers pick the popular model. Few ask if it’s actually the right one for their data. A great place to start is the Retrieval Embedding Benchmark (RTEB) on Hugging Face RTEB goes beyond traditional benchmarks by evaluating models on both open and private datasets, creating a fair, transparent standard for how embedding models perform on unseen data. When choosing candidate models for evaluation, we recommend focusing on the following columns: 1️⃣ Mean (task): higher = better retrieval quality 2️⃣ Embedding Dimensions: fewer dims = cheaper + faster 3️⃣ Model Size: bigger models = more compute + more $ (edited) 4️⃣ Max Tokens: higher limits let you embed more text per vector, but too much can hurt retrieval accuracy If you want to check it for yourself, visit the Retrieval Embedding Benchmark (RTEB) Leaderboard on Hugging Face: https://lnkd.in/gQw6K_jZ