Post by OpenAI

11,106,184 followers

Let’s talk about evals. We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed. Tejal P., who leads our frontier evals team, spoke to Andrew Mayne about why evals matter and what models need to be judged on next. https://lnkd.in/dr9D6-8i

Post content