Post by NEC Laboratories Europe

6,293 followers

Assessing AI reviewers is becoming increasingly important as they enter scientific peer review. While they can generate detailed feedback at scale, their quality compared to human reviewers remains unclear. To address this, we conducted a large-scale expert annotation study evaluating nearly 3,000 individual criticisms drawn from both AI-generated and human-written reviews across correctness, significance, and evidential support. Our results show that leading AI reviewers can match or exceed humans on identifying significant, well-supported issues, while also uncovering additional points not raised by human reviewers. However, they also show limitations, including greater overlap, weaker performance in specialized domains, and a tendency to overemphasize minor concerns. These findings suggest that AI reviewers are best used to complement human expertise, supporting a hybrid approach that combines the strengths of both. To learn more, read our paper, “On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists,” by Seungone Kim et al.: https://lnkd.in/eyuR9kG8. #NECLabs #AI