Subquadratic — Efficiency is Intelligence
- Source: Subquadratic — Efficiency is IntelligencePublisher: SubquadraticArchived: May 5, 2026
- Introducing SubQ: The First Fully Subquadratic LLM
- Introducing SubQ
- What we’re announcing today
- The benchmarks
- Why this is hard
- What this enables
- Economics matter
- What we’re building toward
- Backing and team
Source: Subquadratic — Efficiency is Intelligence Publisher: Subquadratic Archived: May 5, 2026
Product
Introducing SubQ: The First Fully Subquadratic LLM
By Justin Dangel, co-founder and CEO, Subquadratic
Date
May 5, 2026
Transformers defined the last decade of AI — the “T” in ChatGPT. They unlocked modern language understanding, reasoning, and much of what we now think of as AI. Yet one fundamental limitation has shaped everything built on top of them: compute requirements scale quadratically with context length.
In practice, longer inputs don’t reliably improve how well a model uses the information provided. As context grows, models become less consistent at identifying what actually matters. This is a direct consequence of how transformers work. Every token is compared against every other token, so as inputs grow, the number of interactions — and the compute required to process them — scales quadratically. That relationship has influenced what gets built, what systems cost, and where practical limits show up in real-world use.
The industry adapted by building around transformer limitations. Developers and investors spend more of their time and money on workarounds than on the problem itself. RAG systems use a search engine to pull a small number of relevant results before sending them to the model, because sending the full corpus isn’t feasible. Retrieval pipelines, chunking strategies, prompt engineering. But the underlying scaling behavior never changed. Today’s systems that require millions of tokens are brittle, expensive, and hard to build.
Until now.
Introducing SubQ
Subquadratic is an AI company building a new class of large language models. Our first model, SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture, one where compute grows linearly with context length.
This allows significantly increased context windows, state-of-the-art accuracy on needle-in-a-haystack and exact copy tests, faster inference, and significantly lower cost to improve together. Historically, making models subquadratic meant sacrificing on accuracy, and reducing cost meant sacrificing performance. SubQ improves all of that at once. Not incrementally, but at an order of magnitude that makes millions of tokens of context a practical reality.
With a research result at 12 million tokens, SubQ’s architecture reduces attention compute by almost 1,000x compared to other frontier models.
What we’re announcing today
Starting today, SubQ will be available for early access via:
- API — The full-context API for developers and enterprise teams.
- SubQ Code — A coding agent built on SubQ, available via command line interface (CLI). SubQ Code loads entire codebases into a single context window, enabling developers to plan, execute, and review across a full repository in a single pass — without the coordination overhead of multi-agent systems.
- SubQ Search — A long-context search tool providing Deep Research capabilities with chatbot speed.
All are available via private beta starting today
The benchmarks
To evaluate long-context performance, we ran a series of benchmark tests with SubQ 1M-Preview verified by a third-party on the RULER 128K benchmark, a standard benchmark for reasoning over extended inputs.
- SubQ 1M-Preview scores 95% accuracy, compared to 94.8% for Claude Opus 4.6
- SubQ Sparse Attention is 52× faster than FlashAttention in our architecture-level comparison, while requiring 63% less compute.
Together, these results show frontier-level long-context accuracy with a substantially more efficient attention architecture.
We also ran SubQ on MRCR v2, which tests a model’s ability to retrieve and reason over multiple pieces of information spread across a long context (a closer proxy for real-world use).
- Research result of 83 and a production model, third-party verified score of 65.9, SubQ 1M-Preview compares favorably with other SOTA models like Claude Opus 4.7 (32.2), GPT 5.5 (74), and Gemini 3.1 Pro (26.3).
- SWE-Bench Verified score of 81.8 compared to Opus 4.6 (80.8) and Deepseek 4.0 Pro (80.0).
SubQ’s research model performs on up to 12 million tokens, while other frontier models break down well before their stated 1M-token limit.
Why this is hard
The research community has understood for years that quadratic scaling was a ceiling. Subquadratic attention mechanisms like linear attention, state space models, Mamba and sparse attention variants, have been an active area of investigation. The unsolved problem wasn’t the idea. It was building a subquadratic architecture that didn’t sacrifice frontier-level performance to get there.
That’s what took time. Subquadratic’s research team, PhDs and published researchers from Meta, Google, Oxford, BYU, ByteDance, Adobe and Cambridge, spent that time on the math. It’s a ground-up redesign of how attention works, built to be subquadratic from first principles.
What this enables
The implications aren’t abstract. Expanding the frontier for context window size makes it possible to process entire codebases, large document collections, large spreadsheets, database tables, or long-running interaction histories in a single pass.
This reduces the need for retrieval pipelines and agentic workflows, and helps preserve information that would otherwise be lost at context boundaries. At 50 million tokens, the design space for AI applications changes fundamentally.
As context expands further, models can maintain continuity over longer time horizons, enabling new types of applications that depend on persistent state and deeper reasoning.
Economics matter
Cost is quickly becoming the primary constraint in deploying AI systems. When inference is expensive, teams limit usage, reduce context, or avoid certain applications altogether. Many ideas never reach production because the economics don’t hold.
SubQ makes workloads that were previously cost-prohibitive viable at scale. It becomes feasible to run high-volume workloads, include more context, and support applications that rely on sustained interaction with models.
What we’re building toward
Every major constraint in computing history eventually broke. When it did, entirely new categories of products emerged that nobody predicted. Quadratic scaling has been that constraint for AI.
The most valuable applications of AI remain unbuilt because the existing architecture can’t support them. SubQ changes that and we’re at the beginning of understanding what becomes possible when the architecture stops getting in the way.
Backing and team
Subquadratic has raised $29M in seed funding from investors including Javier Villamizar, Justin Mateen, co-founder of Tinder and founder of JAM Fund, Grant Gittlin of Lasagna, and Jaclyn Rice Nelson of Coalition Operators, alongside early investors in Anthropic, OpenAI, Stripe, and Brex.
Justin Dangel, CEO, is a five-time founder and CEO with a track record across health tech, insurancetech, and consumer goods. His companies have scaled to hundreds of employees, attracted institutional backing, and reached liquidity.
Alex Whedon, CTO, previously worked as a software engineer at Meta and served as Head of Generative AI at TribeAI, where he led over 40 enterprise AI implementations.
Subquadratic’s team includes 11 PhD researchers and research engineers with backgrounds from Meta, Google, Oxford, Cambridge, ByteDance, Adobe and Microsoft.
- Reference: https://subq.ai/introducing-subq
Write a comment