DocTrace Achieves 8.85% F1 Improvement on Long-Document QA With 53% Cost Reduction

Researchers from multiple institutions have introduced DocTrace, a new approach to long-document question answering that achieves up to 8.85% F1 score improvement over existing methods while reducing computational costs by 53.32%. The system, detailed in an arXiv paper published June 9, 2026, addresses the challenge of reasoning over evidence scattered across lengthy documents where answers depend on event order, section-level context, and cross-part connections.

DocTrace Uses Query-Triggered Hypergraph Memory for Adaptive Reasoning

The DocTrace architecture consists of three core components. First, a document structural tree index provides a lightweight representation that preserves document hierarchy without expensive preprocessing. Second, a query-triggered hypergraph memory serves as an agent-shared working memory constructed on-demand during reasoning. Third, a graph-structured experience memory stores successful reasoning plans for reuse across related questions, enabling adaptive exploration based on historical patterns.

This design contrasts with existing approaches that rely on costly query-agnostic knowledge organization and insufficient use of original document structure. By constructing its knowledge graph only when needed and leveraging past reasoning experience, DocTrace avoids the computational overhead of traditional methods.

Performance Results Demonstrate Significant Gains Across Multiple Benchmarks

DocTrace achieved best-in-class performance on three of four long-document QA datasets tested in the research:

Surpasses ComoRAG (the strongest baseline) by up to 8.85% in F1 score
Improves Exact Match (EM) scores by up to 4.40% over ComoRAG
Reduces overall computational cost by 53.32% compared to baseline methods
Demonstrates document-structure-aware reasoning that adapts based on successful historical patterns

The research identifies three key limitations in existing long-document QA systems: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience. DocTrace addresses all three through its query-triggered approach and experience-guided reasoning.

Research Fits Into Broader Trend of Hypergraph-Based Memory Systems

The paper is part of a broader 2026 trend toward hypergraph-based memory architectures for AI systems. Related work includes HyperMem (arXiv 2604.08256) for long-term conversations and HGMem (arXiv 2512.23959) for multi-step RAG with hypergraph working memory. The DocTrace approach demonstrates how query-triggered knowledge organization can deliver superior performance while maintaining computational efficiency.

Key Takeaways

DocTrace improves F1 scores by up to 8.85% over the strongest baseline (ComoRAG) on long-document question answering tasks
The system reduces computational costs by 53.32% through query-triggered knowledge organization instead of expensive preprocessing
DocTrace uses three core components: a document structural tree index, query-triggered hypergraph memory, and graph-structured experience memory
The approach achieved best performance on three of four tested long-document QA datasets
DocTrace represents part of a broader 2026 trend toward hypergraph-based memory architectures in AI systems

DocTrace Uses Query-Triggered Hypergraph Memory for Adaptive Reasoning

Performance Results Demonstrate Significant Gains Across Multiple Benchmarks

DocTrace achieved best-in-class performance on three of four long-document QA datasets tested in the research:

Surpasses ComoRAG (the strongest baseline) by up to 8.85% in F1 score

Improves Exact Match (EM) scores by up to 4.40% over ComoRAG

Reduces overall computational cost by 53.32% compared to baseline methods

Demonstrates document-structure-aware reasoning that adapts based on successful historical patterns

Research Fits Into Broader Trend of Hypergraph-Based Memory Systems

Key Takeaways

DocTrace improves F1 scores by up to 8.85% over the strongest baseline (ComoRAG) on long-document question answering tasks

The system reduces computational costs by 53.32% through query-triggered knowledge organization instead of expensive preprocessing

DocTrace uses three core components: a document structural tree index, query-triggered hypergraph memory, and graph-structured experience memory

The approach achieved best performance on three of four tested long-document QA datasets

DocTrace represents part of a broader 2026 trend toward hypergraph-based memory architectures in AI systems