Subquadratic Raises $29M for 12-Million-Token Context LLM with Sparse Attention

On May 5, 2026, Miami-based startup Subquadratic launched with $29 million in seed funding to develop SubQ, the first fully subquadratic large language model. The company's innovation centers on Subquadratic Sparse Attention (SSA), an architecture that achieves linear computational scaling with context length—a fundamental departure from traditional transformers that require quadratic scaling. The result is a model with a 12-million-token context window that the company claims is 52× faster than FlashAttention and uses 63% less compute than traditional attention mechanisms.

Investors Back Fundamental Architecture Shift

The $29 million seed round attracted investors including Javier Villamizar, Justin Mateen (co-founder of Tinder and founder of JAM Fund), Grant Gittlin of Lasagna, and Jaclyn Rice Nelson of Coalition Operators, alongside early investors in Anthropic, OpenAI, Stripe, and Brex. The funding supports Subquadratic's team of PhDs and published researchers from Meta, Google, Oxford, BYU, ByteDance, Adobe, and Cambridge who engineered what they describe as a "ground-up redesign of how attention works."

Sparse Attention Delivers 1,000× Efficiency Gain

Subquadratic's core innovation addresses the computational bottleneck in standard transformer attention, which compares every token against every other token in O(n²) complexity. SSA instead learns to identify which token-to-token comparisons actually matter and computes attention only over those positions, achieving linear O(n) scaling. The company reports:

52× faster performance than FlashAttention in architecture-level comparisons
63% reduction in compute compared to traditional attention
Nearly 1,000× reduction in attention compute versus other frontier models
12-million-token context window that maintains performance without breaking down

Practical Applications for Long-Context AI

The expanded context window enables processing entire codebases, large document collections, spreadsheets, database tables, or extended interaction histories in a single pass. This makes SubQ particularly relevant for agentic AI applications requiring sustained context over long-horizon tasks. Standard transformer attention's quadratic scaling has been a limiting factor for such applications, while subquadratic attention provides the architectural foundation for real long-context agent capabilities.

Controversy Over Efficiency Claims

The startup's claims of 1,000× AI efficiency gains have prompted researchers to demand independent verification. However, the 12-million-token research result has been confirmed. The company positions SSA as moving beyond theoretical concepts to achieve frontier-level performance without sacrificing accuracy, built on the premise that most token-to-token comparisons in standard attention represent wasted compute.

Key Takeaways

Subquadratic raised $29 million in seed funding to develop SubQ, the first fully subquadratic LLM with linear computational scaling
The model achieves a 12-million-token context window, 52× faster than FlashAttention, and uses 63% less compute than traditional attention
Subquadratic Sparse Attention (SSA) reduces attention compute by nearly 1,000× by learning which token comparisons matter rather than comparing all tokens
The architecture enables processing entire codebases, large documents, or extended interaction histories in a single pass
Researchers have called for independent verification of efficiency claims, though the 12-million-token result has been confirmed

Investors Back Fundamental Architecture Shift

Sparse Attention Delivers 1,000× Efficiency Gain

52× faster performance than FlashAttention in architecture-level comparisons

63% reduction in compute compared to traditional attention

Nearly 1,000× reduction in attention compute versus other frontier models

12-million-token context window that maintains performance without breaking down

Practical Applications for Long-Context AI

Controversy Over Efficiency Claims

Key Takeaways

Subquadratic raised $29 million in seed funding to develop SubQ, the first fully subquadratic LLM with linear computational scaling

The model achieves a 12-million-token context window, 52× faster than FlashAttention, and uses 63% less compute than traditional attention

Subquadratic Sparse Attention (SSA) reduces attention compute by nearly 1,000× by learning which token comparisons matter rather than comparing all tokens

The architecture enables processing entire codebases, large documents, or extended interaction histories in a single pass

Researchers have called for independent verification of efficiency claims, though the 12-million-token result has been confirmed