Post by regenold

2,100 followers

š—›š—¼š˜„ š——š—¼š—²š˜€ š—§š—µš—² š—˜š—Ø š—”š—œ š—”š—°š˜ š—¤&š—” š—•š—²š—»š—°š—µš—ŗš—®š—æš—ø š—–š—µš—®š—¹š—¹š—²š—»š—“š—² š—”š—°š˜š˜‚š—®š—¹š—¹š˜† š—Ŗš—¼š—æš—ø? Following the launch of the EU AI Act Q&A Benchmark Challenge 2026, several participants have asked us about the evaluation methodology. The short answer: we do not measure whether a system sounds convincing. š—Ŗš—² š—ŗš—²š—®š˜€š˜‚š—æš—² š˜„š—µš—²š˜š—µš—²š—æ š—¶š˜ š—¶š˜€ š—°š—¼š—æš—æš—²š—°š˜. Each participating AI system receives the same set of expert-developed questions covering Regulation (EU) 2024/1689, the EU AI Act. For every response, we evaluate five dimensions: š—”š—»š˜€š˜„š—²š—æ š—–š—¼š—æš—æš—²š—°š˜š—»š—²š˜€š˜€ Does the answer accurately address the question? We score both:  • Strict correctness (fully correct)  • Loose correctness (substantially correct) š—„š—²š—³š—²š—æš—²š—»š—°š—² š—”š—°š—°š˜‚š—æš—®š—°š˜† Can the system identify the correct legal basis? A correct answer supported by an incorrect article reference still represents a regulatory risk.  • Strict correctness (fully precise)  • Loose correctness (mostly precise) š—–š—¼š—»š—°š—¶š˜€š—²š—»š—²š˜€š˜€ Can the system provide the necessary information without excessive verbosity? In regulatory environments, clarity often matters more than word count. š—§š—¼š—»š—² We assess whether responses are professional, precise, and unambiguous. Evasive answers are penalised. š—Ÿš—®š˜š—²š—»š—°š˜† How quickly can the system provide a response? Speed is not everything, but it remains an important usability factor. The result is a multi-dimensional performance profile rather than a single score. A system may be highly accurate but slow. Another may be fast but struggle with references. The benchmark is designed to reveal these trade-offs transparently. Every participant receives an individual evaluation report showing their performance across all dimensions. Participation remains free of charge. š—œš—»š˜š—²š—æš—²š˜€š˜š—²š—± š—¶š—» š—Æš—²š—»š—°š—µš—ŗš—®š—æš—øš—¶š—»š—“ š˜†š—¼š˜‚š—æ š˜€š˜†š˜€š˜š—²š—ŗ? https://lnkd.in/eMVmMRC4 We are currently accepting registrations for the first evaluation cycle. #AIAct #RegulatoryAI #AIBenchmark #ArtificialIntelligence #AIGovernance

Post content