Post by Kaggle

517,856 followers

In the first episode of Kaggle Conversations, Megan Risdal, Product Lead at Kaggle, and Walter Reade, Technical Lead for Kaggle Competitions, speak with Alex Shaw, Member of Technical Staff at The Laude Institute, co-creator of Terminal-Bench, and creator of the Harbor Agentic Evaluation Framework (https://lnkd.in/gzzqRSjj), about how the AI Agent industry is moving towards autonomy, why software engineering benchmarks are prone to contamination, and what it took to build a robust evaluation framework for the $1M Konwinski Prize. Episode highlights:  • Why autonomous software engineering is shifting into a machine learning problem • What the $1M Konwinski Prize revealed about coding  • How the team applied competition experience to prevent contamination • How Harbor gives the open-source community a unified environment for creating and running agent benchmarks Watch a clip from the conversation 👇

Post content

Video Content