A new research paper published on arXiv May 14, 2026 challenges the widespread assumption that vector-based retrieval is superior for AI agent systems. The study, titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search," found that simple grep search generally yields higher accuracy than vector retrieval across multiple agent architectures, with overall performance depending heavily on how tools are called and results are presented.
Two-Part Experiment Compares Retrieval Strategies Across Agent Architectures
Researchers Sahil Sen, Akhil Kasturi, Elias Lumer, Anmol Gulati, and Vamse Kumar Subbiah conducted two experiments to systematically compare retrieval strategies. The first experiment compared grep and vector retrieval on a 116-question sample from LongMemEval using both a custom agent harness called Chronos and provider-native CLI harnesses including Claude Code, Codex, and Gemini CLI. Both inline tool results and file-based tool results were tested.
The second experiment progressively mixed unrelated conversation history into queries to measure how each retrieval method performed when embedded in distracting material. This design tested real-world scenarios where relevant information must be extracted from noisy contexts.
Agent Harness Design Matters as Much as Retrieval Strategy
The research revealed that "across Chronos and the provider CLIs, grep generally yields higher accuracy than vector retrieval in our comparisons in experiment 1; at the same time, overall scores still depend strongly on which harness and tool-calling style is used, even when the underlying conversation data are the same." This finding suggests that how agent systems structure tool interactions may be as important as which retrieval technology they employ.
Implications for RAG System Design
The study addresses a gap in existing literature on retrieval-augmented generation (RAG) systems, which lacks systematic comparison of how retrieval strategy choice interacts with agent architecture. The researchers specifically examined under-explored dimensions including how tool outputs are presented to models and how performance changes when searches must cope with irrelevant surrounding text in agent loops.
Key Takeaways
- Simple grep search generally outperformed vector retrieval across multiple AI agent architectures in experiments on the LongMemEval dataset
- Agent harness design and tool-calling paradigm affect performance as much as the choice between grep and vector retrieval
- The research tested both inline and file-based tool result presentation methods across custom and provider-native agent systems
- When additional unrelated conversation history was added, the relative performance of retrieval strategies changed, highlighting context sensitivity
- The findings challenge assumptions about vector-based RAG always being superior for agentic search systems