Researchers from multiple institutions have introduced DocTrace, a new approach to long-document question answering that achieves up to 8.85% F1 score improvement over existing methods while reducing computational costs by 53.32%. The system, detailed in an arXiv paper published June 9, 2026, addresses the challenge of reasoning over evidence scattered across lengthy documents where answers depend on event order, section-level context, and cross-part connections.
DocTrace Uses Query-Triggered Hypergraph Memory for Adaptive Reasoning
The DocTrace architecture consists of three core components. First, a document structural tree index provides a lightweight representation that preserves document hierarchy without expensive preprocessing. Second, a query-triggered hypergraph memory serves as an agent-shared working memory constructed on-demand during reasoning. Third, a graph-structured experience memory stores successful reasoning plans for reuse across related questions, enabling adaptive exploration based on historical patterns.
This design contrasts with existing approaches that rely on costly query-agnostic knowledge organization and insufficient use of original document structure. By constructing its knowledge graph only when needed and leveraging past reasoning experience, DocTrace avoids the computational overhead of traditional methods.
Performance Results Demonstrate Significant Gains Across Multiple Benchmarks
DocTrace achieved best-in-class performance on three of four long-document QA datasets tested in the research:
- Surpasses ComoRAG (the strongest baseline) by up to 8.85% in F1 score
- Improves Exact Match (EM) scores by up to 4.40% over ComoRAG
- Reduces overall computational cost by 53.32% compared to baseline methods
- Demonstrates document-structure-aware reasoning that adapts based on successful historical patterns
The research identifies three key limitations in existing long-document QA systems: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience. DocTrace addresses all three through its query-triggered approach and experience-guided reasoning.
Research Fits Into Broader Trend of Hypergraph-Based Memory Systems
The paper is part of a broader 2026 trend toward hypergraph-based memory architectures for AI systems. Related work includes HyperMem (arXiv 2604.08256) for long-term conversations and HGMem (arXiv 2512.23959) for multi-step RAG with hypergraph working memory. The DocTrace approach demonstrates how query-triggered knowledge organization can deliver superior performance while maintaining computational efficiency.
Key Takeaways
- DocTrace improves F1 scores by up to 8.85% over the strongest baseline (ComoRAG) on long-document question answering tasks
- The system reduces computational costs by 53.32% through query-triggered knowledge organization instead of expensive preprocessing
- DocTrace uses three core components: a document structural tree index, query-triggered hypergraph memory, and graph-structured experience memory
- The approach achieved best performance on three of four tested long-document QA datasets
- DocTrace represents part of a broader 2026 trend toward hypergraph-based memory architectures in AI systems