Post by Braintrust
13,735 followers
Stateful agents that do real work are worth investing in, but they're also more difficult to eval. The hard part isn't scoring, it's getting the state right. The best approach is to be deliberate about what needs to be real and what can be mocked in the eval, so you balance cost and time while still capturing state accurately. Read more → https://lnkd.in/eCpEUDFT