Post by Appen

1,069,851 followers

Appen independently evaluated Subquadratic's SubQ 1.1 Small Preview models across two benchmarks: Needle-in-a-Haystack (NIAH) for long-context retrieval and LiveCodeBench for code generation. Key results: - 100% retrieval accuracy at 1M and 2M token contexts (NIAH, niah_single_1, RULER suite) - 98% exact-match at 6M and 12M tokens - 78.0% pass@1 and 89.7% pass@4 on LiveCodeBench across 1,055 problems The evaluation was conducted with full independence - no model weights or training data were provided in advance. The public brief is available to download here: https://lnkd.in/eru2ttnB #AI #ModelEvaluation #LLM #LongContext #LiveCodeBench