Post by Josh Reini

Evals, OSS and AI @ ❄️

Cortex Code has a built-in skill that generates a full observability report for any Cortex Agent. It pulls 110 metrics from Snowflake agent observability event logs: usage, latency percentiles, token economics, tool execution stats, user feedback breakdowns, conversation depth. The output includes specific findings with context: "search tool error rate is 2.1% vs 0.4% for analyst" or "cache hit rate is 41%, investigate prompt structure." Ask your agent how it's doing. It'll actually tell you.