Post by OpenAI
11,106,184 followers
Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed. Tejal P., who leads our frontier evals team, spoke to Andrew Mayne about why evals matter and what models need to be judged on next. https://lnkd.in/dr9D6-8i