Post by Kaggle

516,159 followers

Since launching Kaggle Benchmarks, the community has created over 10,000 tasks to measure AI model capabilities — clear benchmarks that pinpoint where models fall short and give labs the signal they need to train better ones. But until now, creating a benchmark meant working exclusively in Kaggle's notebook editor, not the stack you actually build with. Today, we're launching local development for Kaggle Benchmarks. You can now create, validate, push, run and download tasks directly from your local dev environment — VSCode, Cursor, Antigravity, Claude Code and more — using the write-kaggle-benchmarks skill. Here's what that looks like in practice: • Describe a benchmark in natural language: your agent writes the code, validates it locally, and pushes it to Kaggle. • Run it against every SOTA model in one go: pass/fail, latency, cost, and token counts come back in your agent panel. • Tasks you push join the public Kaggle Benchmarks ecosystem, where labs use them as a signal to improve their models. If you can measure a capability, labs will work to improve it. The more people building benchmarks that reflect the real world, the better models get at the things that actually matter. Install the skill and get started 👉 https://lnkd.in/g7A4N6DE Built something with it? Share your task and workflow by July 1 and tag @kaggle for a chance to win Kaggle swag and a social shoutout.

Post content

Video Content