Post by Alexander Whedon
Co-Founder, CTO @ Subquadratic
Here is the technical report on SubQ 1.1 Small. https://lnkd.in/e2wzreZg This is the second iteration on our Subquadratic Sparse Attention (SSA) model, and the first to be deployed with design partners in the coming weeks. The results are compelling and verified by Appen . - Near-perfect long-context retrieval up to 12M tokens on the needle-in-a-haystack test, with up to nearly 1,000x attention compute reduction. - A balance of long-context optimization and general reasoning ability, with strong performance retained across knowledge, coding, and non-coding enterprise agent benchmarks. - At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2. These results highlight a significant scaling advantage thanks to the efficiency gains from the SSA architecture. We included some details and learnings from the development process which may be helpful to the community. Comment with questions, I’ll try to respond!