Post by Nethermind

25,193 followers

We open-sourced the evaluation algorithm behind AuditAgent's EVMBench results. 80 of 120 high-severity vulnerabilities detected across all 40 repositories. 67% recall, post-validation. The difference matters: AuditAgent filters findings through a confidence threshold before they surface, so detections that don't pass aren't counted. Most published benchmark numbers skip that step. The blog breaks down per-repo results and how the scoring works. Algorithm's on GitHub.