Your agent doesn’t need to read the whole file to answer your question
The companion post on proxies covered the shell layer — RTK, snip, and lean-ctx sitting between your Bash calls and the model, scrubbing git status and docker ps output before it burns tokens. That’s one half of the token-waste problem. The other half is bigger and dumber: the agent deciding it needs to cat your 4,000-line UserService.java, or grep your entire monorepo and read back every hit, just to answer “where’s the auth check for this endpoint.” lean-ctx actually straddles both worlds a little — it does some read-side filtering too — but I covered it in depth over there, so I won’t re-litigate it here.
This post is about the read side specifically: MCP servers that give your coding agent smart ways to find and retrieve only the relevant slice of a codebase, instead of defaulting to “just read the whole file, it’s fine.” Four tools, four different philosophies for the same underlying complaint — you’re paying context-window rent on bytes nobody asked for.
Full example: Grab the working MCP config snippets for each tool at github.com/KingPin/sumguy-examples/…/productivity/claude-code-codebase-context-mcp
context-mode — my current daily driver
I’ve been running context-mode as my default MCP server for a while now, and the philosophy behind it is different enough from the other three that it’s worth explaining before the feature list. The pitch is “Think-in-Code”: instead of reading raw data into the conversation and reasoning about it there, you do the analysis in a sandbox and only the derived answer crosses back into context. Raw bytes never make the trip.
That sounds abstract until you see the tools.
ctx_batch_execute runs shell commands in parallel, and here’s the trick — it auto-indexes every command’s output into a searchable store as it goes, and if you pass along search queries, it hands you back the matching sections in the same round trip. No “run command, read output, now search it” dance.
{ "mcpServers": { "context-mode": { "command": "npx", "args": ["-y", "@context-mode/mcp-server"] } }}In practice that’s something like handing it three labeled commands — {"label": "compose config", "command": "docker compose config"}, {"label": "nginx access log", "command": "tail -500 /var/log/nginx/access.log"}, {"label": "recent commits", "command": "git log --oneline -50"} — plus a query like “which service is failing health checks.” You get back the relevant lines from all three, not three walls of raw output you have to skim yourself. The label becomes the title of the search chunk, so descriptive labels pay off later.
ctx_search is the workhorse. It’s a unified FTS5 knowledge base under the hood — SQLite, local, no cloud round trip — that merges Porter-stemming search with trigram-substring matching via Reciprocal Rank Fusion, then does a proximity re-rank and typo correction on top. Translation: it finds things whether you spelled them right, whether they’re a partial match, or whether the important words are three tokens apart in the source. And it doesn’t just search your indexed command output — it also searches SESSION MEMORY, which is auto-captured: decisions you made, errors you hit, plans you wrote down, prior prompts. Sort by timeline and you can literally ask “what did we decide about the retry logic two hours ago” and get a straight answer instead of scrolling back through the transcript yourself.
ctx_execute and ctx_execute_file run actual code — JS or shell — in a subprocess, and only what you console.log() makes it back to you. Say you’ve got a 40MB JSON export and you need to know how many records have a null email field. You don’t read the file. You write six lines of JS that load it, filter it, and log the count. The 40MB stays in the sandbox; you get back the number 312.
ctx_fetch_and_index does the same trick for the web — fetch a URL, convert to markdown, index it, and you pull sections back out with ctx_search instead of dumping the whole page into your conversation.
The honest caveat: context-mode isn’t code-structure-aware the way a dependency graph is. It doesn’t know that OrderService.calculateTax() calls TaxRateProvider.lookup() three files over. It’s a general-purpose “keep raw bytes out of context” layer — for shell output, files, and web content — plus a genuinely useful persistent memory. If most of your token waste is logs, build output, and repeated file reads rather than “I need to understand how this function is used across the codebase,” this is the tool doing the actual work day to day.
code-review-graph — the credible challenger
If context-mode is the general-purpose sandbox, code-review-graph is the specialist: a local-first code intelligence graph, exposed over both MCP and CLI, built specifically so an agent reads only the functions and files that matter for the question it’s actually answering.
Install is refreshingly boring:
pipx install code-review-graphcode-review-graph installcode-review-graph buildinstall auto-detects and wires up whichever AI coding tool you’re running — Claude Code, Codex, Cursor, Windsurf, Zed, Continue, OpenCode, Gemini CLI, GitHub Copilot, and a few more — so you’re not hand-editing five different config formats for five different editors. build does the actual graph construction: tree-sitter parses your source across a broad set of languages (and you can add your own via a languages.toml if your stack is weird), then it runs incremental updates after that — re-parsing only the files that changed, typically under 2 seconds even on a decent-sized repo.
[[language]]name = "my-dsl"extensions = [".mydsl"]grammar = "tree-sitter-my-dsl"What makes it more than “grep with extra steps” is Leiden community detection clustering related code into logical groups, execution-flow tracing so you can ask “what actually runs when this endpoint gets hit” and get a path instead of a guess, an architecture overview that flags coupling problems before they become a 2 AM incident, and risk-scored reviews — detect_changes takes a diff and maps it to the functions, flows, and test gaps it actually touches, instead of you eyeballing a 600-line PR wondering what you might have missed.
Now, the benchmarks, because I want to be careful here the way I wasn’t tempted to be careless. code-review-graph claims roughly an 82x median per-question token reduction across six real repos, with a range of 38x to 528x. That 528x number is the one that’ll end up in a tweet somewhere, and it’s real, but it’s the max — the fastapi repo, best case. The median across all six repos is 82x. Say the honest number, not the flashy one.
What actually earns my trust here isn’t the number, it’s that they publish a reproduction recipe (docs/REPRODUCING.md) and run a weekly eval in CI. That’s the difference between “trust our README” and “here’s how to check for yourself.” I didn’t rerun their full eval suite myself for this post — I looked at the methodology and the CI history, not personally reproduced all six repos — but the fact that the recipe exists and runs on a schedule is a meaningfully higher bar than every other tool in this comparison clears.
token-savior — loud claims, verify yourself
token-savior is an MCP server combining structural code navigation with persistent memory, and it says it works with any MCP client, not just Claude Code. Around a thousand GitHub stars last I looked, which tells you people are at least curious.
The claims on the README, though — and I want to be explicit that these are their marketing copy, not something I verified — are things like “-77% active tokens, -76% wall time, 0 losses across 96 tasks on Claude Opus 4.7,” and “the only coding agent hitting 100% on a real benchmark.” That’s a big swing. No reproduction recipe, no published methodology I could find, no CI eval history like code-review-graph has. Self-reported numbers on your own README are, structurally, the weakest form of evidence a tool can offer — not because the tool is bad, but because there’s no way for a stranger to check the work.
{ "mcpServers": { "token-savior": { "command": "npx", "args": ["-y", "token-savior-mcp"] } }}The underlying idea — structural navigation plus persistent memory in one server — is a legitimate one, and honestly overlaps conceptually with pieces of both context-mode and code-review-graph. But I’d trial this one on your own repo with your own before/after token counts before you let the headline number do any deciding for you. Loud claims aren’t automatically false. They’re just unverified until somebody other than the vendor runs the numbers.
claude-context — the vector-search one
claude-context takes a completely different approach: semantic vector search over your codebase. It’s built by Zilliz, the company behind the Milvus vector database, and the monorepo ships a core indexing package, a VSCode “Semantic Code Search” extension, and an MCP server.
The catch, and it’s a real one for how this blog is run: it requires a vector database — Zilliz Cloud, which has a free tier — and an OpenAI embedding API key, specifically for text-embedding-3-small.
{ "mcpServers": { "claude-context": { "command": "npx", "args": ["-y", "@zilliz/claude-context-mcp"], "env": { "OPENAI_API_KEY": "sk-...", "ZILLIZ_CLOUD_URI": "https://your-cluster.zillizcloud.com", "ZILLIZ_CLOUD_TOKEN": "..." } } }}Technically this is a capable tool, and semantic search genuinely finds things keyword search misses — “the thing that handles retry backoff” turns up code that doesn’t contain the word “retry.” But look at that config again: you’re shipping every chunk of your codebase to a cloud vector database, and paying OpenAI per index run for embeddings. For a homelab, self-hosted, local-first setup — the entire ethos this blog is built around — that’s not a minor inconvenience, it’s a philosophical mismatch. It’s like buying a diesel truck and then hauling your firewood to a service that chops it for you, three states away, for a per-log fee. If you’re already all-in on cloud tooling and an OpenAI key is just sitting in your .env anyway, this is a solid, well-backed option. If you’re the person this blog is written for, it’s a pass on principle before you even get to evaluating the search quality.
The honest verdict
context-mode is staying as my daily driver. It’s not code-structure-aware, and that’s fine — it was never trying to be. It’s a general context-hygiene layer for shell output, files, and web content, plus a searchable session memory that’s saved me from re-explaining decisions I made three hours earlier. If your token waste is logs and build output and repeated cats, this is the tool doing the work.
code-review-graph is the one I’d add alongside context-mode, not instead of it — they’re solving different halves of the same problem. It’s local-first, it actually understands code structure through the graph and Leiden clustering, and it backs its benchmark claims with a reproduction recipe and a weekly CI eval instead of just a README table. For code review and big monorepos specifically, this is the strongest tool in the lineup.
token-savior has a legitimately interesting idea buried under a lot of unverified marketing. Trial it yourself, measure your own before-and-after, and don’t let “0 losses across 96 tasks” do your thinking for you.
claude-context is only worth it if you’re fine shipping your codebase’s embeddings to a cloud vector database and paying OpenAI per index. For a local-first homelab setup, that’s a pass — not because it’s a bad tool, but because it’s solving the problem with the exact dependency this whole blog exists to avoid.
Zoom out and the pattern across both posts in this series is the same: local-first wins, and honest benchmarks beat loud ones. And nobody’s making you pick exactly one tool forever — a general hygiene layer plus a code graph compose just fine, and that’s the setup I’m actually running.