Research Finds Simple Grep Outperforms Vector Retrieval in AI Agent Systems

A new research paper published on arXiv May 14, 2026 challenges the widespread assumption that vector-based retrieval is superior for AI agent systems. The study, titled "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search," found that simple grep search generally yields higher accuracy than vector retrieval across multiple agent architectures, with overall performance depending heavily on how tools are called and results are presented.

Two-Part Experiment Compares Retrieval Strategies Across Agent Architectures

Researchers Sahil Sen, Akhil Kasturi, Elias Lumer, Anmol Gulati, and Vamse Kumar Subbiah conducted two experiments to systematically compare retrieval strategies. The first experiment compared grep and vector retrieval on a 116-question sample from LongMemEval using both a custom agent harness called Chronos and provider-native CLI harnesses including Claude Code, Codex, and Gemini CLI. Both inline tool results and file-based tool results were tested.

The second experiment progressively mixed unrelated conversation history into queries to measure how each retrieval method performed when embedded in distracting material. This design tested real-world scenarios where relevant information must be extracted from noisy contexts.

Agent Harness Design Matters as Much as Retrieval Strategy

The research revealed that "across Chronos and the provider CLIs, grep generally yields higher accuracy than vector retrieval in our comparisons in experiment 1; at the same time, overall scores still depend strongly on which harness and tool-calling style is used, even when the underlying conversation data are the same." This finding suggests that how agent systems structure tool interactions may be as important as which retrieval technology they employ.

Implications for RAG System Design

The study addresses a gap in existing literature on retrieval-augmented generation (RAG) systems, which lacks systematic comparison of how retrieval strategy choice interacts with agent architecture. The researchers specifically examined under-explored dimensions including how tool outputs are presented to models and how performance changes when searches must cope with irrelevant surrounding text in agent loops.

Key Takeaways

Simple grep search generally outperformed vector retrieval across multiple AI agent architectures in experiments on the LongMemEval dataset
Agent harness design and tool-calling paradigm affect performance as much as the choice between grep and vector retrieval
The research tested both inline and file-based tool result presentation methods across custom and provider-native agent systems
When additional unrelated conversation history was added, the relative performance of retrieval strategies changed, highlighting context sensitivity
The findings challenge assumptions about vector-based RAG always being superior for agentic search systems

Two-Part Experiment Compares Retrieval Strategies Across Agent Architectures

Agent Harness Design Matters as Much as Retrieval Strategy

Implications for RAG System Design

Key Takeaways

Simple grep search generally outperformed vector retrieval across multiple AI agent architectures in experiments on the LongMemEval dataset

Agent harness design and tool-calling paradigm affect performance as much as the choice between grep and vector retrieval

The research tested both inline and file-based tool result presentation methods across custom and provider-native agent systems

When additional unrelated conversation history was added, the relative performance of retrieval strategies changed, highlighting context sensitivity

The findings challenge assumptions about vector-based RAG always being superior for agentic search systems