Post by Braintrust

13,734 followers

Run a full chess eval without writing a single line of code using the Braintrust CLI. - Take a CSV of chess puzzles and make a dataset. - Write a prompt to solve mate in 2 puzzles, and upload it to the project. - Then write a scorer that compares the output to the expected answer. The eval found that GPT‑5 with no reasoning scored about 25% on the chess puzzles, and with low reasoning it scored about 15%. Learn more in the Braintrust docs → https://lnkd.in/eDXrETtR

Post content

Video Content