Post by regenold

2,100 followers

𝗛𝗼𝘄 𝗗𝗼𝗲𝘀 𝗧𝗵𝗲 𝗘𝗨 𝗔𝗜 𝗔𝗰𝘁 𝗤&𝗔 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲 𝗔𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝗪𝗼𝗿𝗸? Following the launch of the EU AI Act Q&A Benchmark Challenge 2026, several participants have asked us about the evaluation methodology. The short answer: we do not measure whether a system sounds convincing. 𝗪𝗲 𝗺𝗲𝗮𝘀𝘂𝗿𝗲 𝘄𝗵𝗲𝘁𝗵𝗲𝗿 𝗶𝘁 𝗶𝘀 𝗰𝗼𝗿𝗿𝗲𝗰𝘁. Each participating AI system receives the same set of expert-developed questions covering Regulation (EU) 2024/1689, the EU AI Act. For every response, we evaluate five dimensions: 𝗔𝗻𝘀𝘄𝗲𝗿 𝗖𝗼𝗿𝗿𝗲𝗰𝘁𝗻𝗲𝘀𝘀 Does the answer accurately address the question? We score both: • Strict correctness (fully correct) • Loose correctness (substantially correct) 𝗥𝗲𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗔𝗰𝗰𝘂𝗿𝗮𝗰𝘆 Can the system identify the correct legal basis? A correct answer supported by an incorrect article reference still represents a regulatory risk. • Strict correctness (fully precise) • Loose correctness (mostly precise) 𝗖𝗼𝗻𝗰𝗶𝘀𝗲𝗻𝗲𝘀𝘀 Can the system provide the necessary information without excessive verbosity? In regulatory environments, clarity often matters more than word count. 𝗧𝗼𝗻𝗲 We assess whether responses are professional, precise, and unambiguous. Evasive answers are penalised. 𝗟𝗮𝘁𝗲𝗻𝗰𝘆 How quickly can the system provide a response? Speed is not everything, but it remains an important usability factor. The result is a multi-dimensional performance profile rather than a single score. A system may be highly accurate but slow. Another may be fast but struggle with references. The benchmark is designed to reveal these trade-offs transparently. Every participant receives an individual evaluation report showing their performance across all dimensions. Participation remains free of charge. 𝗜𝗻𝘁𝗲𝗿𝗲𝘀𝘁𝗲𝗱 𝗶𝗻 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝗶𝗻𝗴 𝘆𝗼𝘂𝗿 𝘀𝘆𝘀𝘁𝗲𝗺? https://lnkd.in/eMVmMRC4 We are currently accepting registrations for the first evaluation cycle. #AIAct #RegulatoryAI #AIBenchmark #ArtificialIntelligence #AIGovernance