Post by OpenAI
10,205,684 followers
We’re introducing EVMbench, a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. EVMbench measures three core capabilities: - Detect vulnerabilities in real-world contract code - Exploit them in realistic attack scenarios - Patch them safely, with fixes that hold up under testing EVMbench is intended both as a measurement tool and as a call to action. As agents improve, it becomes increasingly important for developers and security researchers to incorporate AI-assisted auditing into their workflows. We release EVMbench’s tasks, tooling, and evaluation framework to support continued research on measuring and managing emerging AI cyber capabilities. https://lnkd.in/gySryDDb