On May 5, 2026, Miami-based startup Subquadratic launched with $29 million in seed funding to develop SubQ, the first fully subquadratic large language model. The company's innovation centers on Subquadratic Sparse Attention (SSA), an architecture that achieves linear computational scaling with context length—a fundamental departure from traditional transformers that require quadratic scaling. The result is a model with a 12-million-token context window that the company claims is 52× faster than FlashAttention and uses 63% less compute than traditional attention mechanisms.
Investors Back Fundamental Architecture Shift
The $29 million seed round attracted investors including Javier Villamizar, Justin Mateen (co-founder of Tinder and founder of JAM Fund), Grant Gittlin of Lasagna, and Jaclyn Rice Nelson of Coalition Operators, alongside early investors in Anthropic, OpenAI, Stripe, and Brex. The funding supports Subquadratic's team of PhDs and published researchers from Meta, Google, Oxford, BYU, ByteDance, Adobe, and Cambridge who engineered what they describe as a "ground-up redesign of how attention works."
Sparse Attention Delivers 1,000× Efficiency Gain
Subquadratic's core innovation addresses the computational bottleneck in standard transformer attention, which compares every token against every other token in O(n²) complexity. SSA instead learns to identify which token-to-token comparisons actually matter and computes attention only over those positions, achieving linear O(n) scaling. The company reports:
- 52× faster performance than FlashAttention in architecture-level comparisons
- 63% reduction in compute compared to traditional attention
- Nearly 1,000× reduction in attention compute versus other frontier models
- 12-million-token context window that maintains performance without breaking down
Practical Applications for Long-Context AI
The expanded context window enables processing entire codebases, large document collections, spreadsheets, database tables, or extended interaction histories in a single pass. This makes SubQ particularly relevant for agentic AI applications requiring sustained context over long-horizon tasks. Standard transformer attention's quadratic scaling has been a limiting factor for such applications, while subquadratic attention provides the architectural foundation for real long-context agent capabilities.
Controversy Over Efficiency Claims
The startup's claims of 1,000× AI efficiency gains have prompted researchers to demand independent verification. However, the 12-million-token research result has been confirmed. The company positions SSA as moving beyond theoretical concepts to achieve frontier-level performance without sacrificing accuracy, built on the premise that most token-to-token comparisons in standard attention represent wasted compute.
Key Takeaways
- Subquadratic raised $29 million in seed funding to develop SubQ, the first fully subquadratic LLM with linear computational scaling
- The model achieves a 12-million-token context window, 52× faster than FlashAttention, and uses 63% less compute than traditional attention
- Subquadratic Sparse Attention (SSA) reduces attention compute by nearly 1,000× by learning which token comparisons matter rather than comparing all tokens
- The architecture enables processing entire codebases, large documents, or extended interaction histories in a single pass
- Researchers have called for independent verification of efficiency claims, though the 12-million-token result has been confirmed