Post by Beefree

7,866 followers

While building our MCP Server, our testing found that the outputs were great. Our CFO did not like the token bills we generated. So we spent time optimizing token consumption without sacrificing output quality, and reduced it by up to 94% across every model we tested. The biggest culprit was MCP re-sending the full schema for all 33 of our available tools on every single turn, meaning the majority of tokens on a typical task were protocol overhead, not generation work. Replacing those 33 definitions with a single scripting interface cut consumption 68–96% depending on the model, and prompt caching took another ~30% off before we touched anything email-specific. We wrote it all up: every fix, every benchmark, every model we tested: https://lnkd.in/ejgS2hre