Semble Code Search Uses 98% Fewer Tokens Than Grep for AI Agents

Semble, an MIT-licensed code search library designed specifically for AI agents, dramatically reduces token consumption when exploring unfamiliar codebases. Created by Stephan and Thomas of MinishLab, the tool addresses a persistent problem: when AI agents like Claude Code can't find code directly, they fall back to grep operations that read entire files, consuming massive amounts of tokens while often missing relevant code.

Technical Implementation Combines Static Embeddings With Lexical Search

Semble uses a hybrid approach that combines static Model2Vec embeddings with BM25 lexical search. The system splits code files into code-aware chunks using Chonkie, then scores queries using two retrievers: Model2Vec embeddings for semantic similarity and BM25 for lexical matches on identifiers and API names. Results are fused via Reciprocal Rank Fusion (RRF) and reranked with code-aware signals. Notably, everything runs on CPU with no transformers involved.

The tool uses MinishLab's custom "potion-code-16M" model, a static code embedding model distilled from nomic-ai/CodeRankEmbed and trained on the CornStack code corpus using Tokenlearn and contrastive fine-tuning.

Benchmark Results Show Dramatic Efficiency Gains

Testing on approximately 1,250 query/document pairs across 63 repositories and 19 programming languages revealed impressive performance metrics:

98% fewer tokens compared to grep+read operations
0.854 NDCG@10 retrieval score
94% recall at just 2,000 tokens (versus 100,000 tokens needed for grep+read at 85% recall)
99% of the retrieval quality of CodeRankEmbed Hybrid, a 137M-parameter code-trained transformer
~200x faster indexing than transformer-based approaches
~10x faster queries
~250ms to index a typical repository
~1.5ms per query on CPU

Zero-Configuration MCP Integration for Claude Code

Semble includes Model Context Protocol (MCP) server integration, making it immediately usable with Claude Code. Installation requires a single command: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble. The tool requires no API keys, GPU, or external services.

The project is available via PyPI and has garnered 1.1k stars on GitHub. The Hacker News announcement generated 38 comments discussing token efficiency, comparisons to existing solutions, and practical implementation details, with particular interest in the MCP integration and token savings.

Key Takeaways

Semble reduces token consumption by 98% compared to grep+read operations while maintaining 94% recall at 2,000 tokens
The tool combines static Model2Vec embeddings with BM25 lexical search, running entirely on CPU without requiring transformers or GPUs
Benchmarks show 0.854 NDCG@10 score across 63 repositories and 19 programming languages, with ~250ms indexing time and ~1.5ms query time
MIT-licensed with zero-configuration MCP integration for Claude Code, requiring no API keys or external services
The custom potion-code-16M embedding model achieves 99% of the retrieval quality of a 137M-parameter transformer while being 200x faster at indexing

Technical Implementation Combines Static Embeddings With Lexical Search

Benchmark Results Show Dramatic Efficiency Gains

Testing on approximately 1,250 query/document pairs across 63 repositories and 19 programming languages revealed impressive performance metrics:

98% fewer tokens compared to grep+read operations

0.854 NDCG@10 retrieval score

94% recall at just 2,000 tokens (versus 100,000 tokens needed for grep+read at 85% recall)

99% of the retrieval quality of CodeRankEmbed Hybrid, a 137M-parameter code-trained transformer

~200x faster indexing than transformer-based approaches

~10x faster queries

~250ms to index a typical repository

~1.5ms per query on CPU

Zero-Configuration MCP Integration for Claude Code

Key Takeaways

Semble reduces token consumption by 98% compared to grep+read operations while maintaining 94% recall at 2,000 tokens

The tool combines static Model2Vec embeddings with BM25 lexical search, running entirely on CPU without requiring transformers or GPUs

Benchmarks show 0.854 NDCG@10 score across 63 repositories and 19 programming languages, with ~250ms indexing time and ~1.5ms query time

MIT-licensed with zero-configuration MCP integration for Claude Code, requiring no API keys or external services

The custom potion-code-16M embedding model achieves 99% of the retrieval quality of a 137M-parameter transformer while being 200x faster at indexing