Post by CoreWeave

140,107 followers

No provider serves Kimi K2.7 Code faster than we do šŸƒā€ā™€ļø 289 tokens per second, the fastest of any provider Artificial Analysis benchmarked. We also sit in the most attractive quadrant of their Speed vs. Price chart: highest output speed, low blended price (check out the comments). Here's how we got there: āœ”ļø Kimi K2.7 Code ships pre-quantized in INT4. Blackwell runs FP4 natively, so our Applied Training team requantized it to NVFP4 for full GB300 and GB200 NVL72 throughput. Accuracy held across every benchmark we ran, from LiveCodeBench to SWE-Bench Pro and Multilingual. āœ”ļø Then we added a DFlash speculative decoder. Instead of generating one token at a time, the model drafts several ahead and verifies them in a single pass, clearing about 3.6 tokens per step. Same output, far fewer round trips. That's where 289 comes from. āœ”ļø In an agent loop that calls the model again and again, speed and price stop being specs and start being your bill. That combination is the line between a demo and something you can afford to run in production. Already on K2.6? Swap the model string and you inherit all of it, with no infrastructure to run. 🫳 šŸŽ¤ https://utm.io/uqHRm

Post content