Skip to content
Go back

RTK vs snip vs lean-ctx: Token Killers

By SumGuy 12 min read
RTK vs snip vs lean-ctx: Token Killers

Your git log is not free, and neither is docker ps

Here’s the thing nobody tells you when you start living inside Claude Code all day: every single command your agent runs gets read back into the model’s context. git status on a repo with forty modified files. docker ps -a on a box that’s been up for six months. npm install spraying three hundred lines of deprecation warnings you’ll never read. None of that is free. It’s tokens, and tokens are money and, worse, they’re attention — the model has to actually process that noise before it can get back to the thing you asked it to do.

So a small cottage industry of “token-killer” proxies has sprung up: tools that sit between your CLI agent and the model and strip the fat out of command output before it ever becomes a token. I’ve been running one of these daily for months, and I spent the last couple weeks putting the other serious contenders through their paces on real repos. This is article one of a two-parter — today’s about intercepting command output at the shell layer. The follow-up covers the other half of the problem: MCP tools that manage codebase context on the read side (searching, indexing, memory) instead of filtering stdout. Different layer, different post.

Full example: Grab the working snip filters and proxy config at github.com/KingPin/sumguy-examples/…/productivity/rtk-vs-snip-vs-lean-ctx

Three tools, one honorable mention. Let’s go.

RTK (Rust Token Killer) — the one I run today, warts and all

RTK is a Rust CLI proxy that hooks into Claude Code’s Bash tool. A PreToolUse hook silently rewrites your commands before they execute — git status becomes rtk git status, docker ps -a becomes rtk docker ps -a — and what comes back to the model is filtered, condensed output instead of the raw firehose. You don’t type any of this. It just happens, transparently, on every Bash call.

The savings are real. I’ve watched RTK cut 60–90% off the token cost of routine dev operations — git status on a messy repo, docker ps on a host with two dozen containers, long build logs. That’s not a marketing number for me, that’s just what happens when you stop shipping the model a wall of text it was going to skim anyway.

Here’s the part the marketing pages don’t put on the front page, and the part I think matters more than the savings number: RTK has some genuinely nasty silent-failure modes, and I’ve hit all of them.

The worst one: RTK will silently swallow sed -i edits.

Terminal window
sed -i 's/DEBUG = False/DEBUG = True/' config.py

Run that through the hook and the command “succeeds” — no error, no complaint — but the file doesn’t change and you get no output telling you anything went sideways. You just come back twenty minutes later wondering why your debug flag never flipped. That’s a special kind of infuriating because there’s no stack trace to Google. It just quietly didn’t happen.

Multi-line commands and shell variable expansion get mangled too. Something like:

Terminal window
D='/app/config'; sed -i "s/foo/bar/" "$D/settings.py"

The hook’s command-rewriting logic doesn’t reliably expand $D — it can pass the literal string through instead of the resolved path, and now you’re editing a file that doesn’t exist, again with no useful error.

And I’ve personally seen RTK return a canned “all formatted, no changes needed” response for prettier without actually running it. Convenient when it’s true, catastrophic when it isn’t, because you now trust a report that never happened.

The workarounds I actually use, daily:

Verdict: RTK works, and the savings are not imaginary. But the silent-failure footguns are a real tax — not a “read the docs and you’ll be fine” tax, a “you will eventually lose twenty minutes to a sed that quietly did nothing” tax. I still run it. I just keep rtk proxy within arm’s reach at all times, the way you keep a tire iron in the trunk even though you hope you never need it.

snip — the one built like actual software

snip bills itself explicitly as “an rtk alternative, in Go” — same pitch, same 60–90% savings claim (their number, not independently measured by me), same core idea of intercepting agent commands before the output hits the model. What’s different is the architecture, and the architecture is the whole story.

RTK’s filtering behavior is baked into the Rust binary. snip’s filtering behavior lives in YAML files that the binary reads at runtime. The pitch, in their own framing: the binary is the engine, filters are data, and the two evolve independently. You don’t need a new release to fix a bad filter — you edit a text file.

Here’s a real one, condensing git log down to just hash and message:

git-log.yml
match:
command: git
subcommand: log
exclude_flags: ["--stat", "--patch", "-p"]
inject:
args: ["--pretty=format:%h %s"]
defaults:
n: 20
pipeline:
- action: keep_lines
pattern: "^[a-f0-9]{7,} "
- action: truncate_lines
max_length: 100
- action: format_template
template: "{hash} — {message}"
on_error: passthrough

Read that last line twice: on_error: passthrough. If the filter chokes on unexpected output — a git log format nobody anticipated, a locale difference, whatever — snip doesn’t eat the command and hand back nothing. It falls back to giving you the real output. That one line is, in my opinion, the single biggest design decision separating “the mature contender” from “the thing that once faked a prettier run.”

snip supports a genuinely wide client list, and it uses whatever integration mechanism fits each one instead of forcing one pattern everywhere:

The other thing I like: per-project filter directories. Your config’s filters.dir takes an array, and later entries override earlier ones:

snip-config.toml
[filters]
dir = [
"~/.config/snip/filters",
"${env.PWD}/.snip"
]

That means you keep a global baseline (your git log, docker ps, npm install filters that work everywhere) and then a per-repo .snip/ directory that overrides just the rules that repo needs — a monorepo with a chatty build tool, a client project with its own logging format. You’re not maintaining one giant global config that slowly turns into spaghetti, and you’re not forking the whole tool per project either.

Verdict: this is the one I’d actually point a fresh setup at. Same core idea as RTK, same order-of-magnitude savings claim, but the YAML-filter design is auditable — you can read exactly what a filter does to your git log output in thirty seconds — and maintainable in a way “recompile the binary” isn’t. Passthrough-on-error is the feature that matters most: it means a bad filter degrades to “you get the full output” instead of “you get nothing and don’t know it.” That’s the difference between an annoyance and a trust problem.

lean-ctx — the tool that wants to run the whole table

lean-ctx (LeanCTX) is not really competing on the same axis as RTK and snip — it’s a single local Rust binary trying to collapse two separate layers of the context problem into one tool.

The read path is an MCP server exposing ctx_* tools — read modes, caching, deltas, search, session memory, multi-agent coordination — plus a shell hook that does RTK/snip-style command output compression. This is the part that competes head-on with today’s two tools.

The wire path is the genuinely novel part, and it’s opt-in via lean-ctx proxy enable. Once it’s on, lean-ctx sits between your agent and the model as a local proxy and compresses everything — system prompt, full conversation history, tool results — not just command output. A few specifics worth calling out because they’re not things I’ve seen elsewhere:

On top of both layers there’s a property graph for impact analysis (which files touch which), session memory, and a browser dashboard. The project claims 76 MCP tools and 30+ agents, running fully local. It auto-selects between a “Hybrid” mode (cached MCP reads plus shell hooks, for agents with real shell access) and an “MCP-only” mode for protocol-only clients that never touch a shell.

Verdict: lean-ctx is the tool for you if you want the whole platform — read-side compression, wire-side compression, the dependency graph, the memory layer — and you’re willing to invest the setup and mental-model time that comes with a 76-tool surface area. If you just want your git log and docker ps output filtered, this is a forklift where a hand truck would do — technically it works, but you didn’t need to learn a property graph to move a couch. The wire-proxy prompt-cache-safe compression is genuinely the standout idea here, and it’s the reason I’m calling this one the bridge to the follow-up article rather than a straight loss to snip. It’s doing real work in the layer that article two is actually about.

Honorable mention: caveman mode, or teaching Claude to grunt

This one’s a novelty, and I mean that with affection. caveman is a Claude Code skill — not a proxy, no hooks, no binary — that makes Claude reply in caveman speak. “Fix bug. Bug in loop. Loop wrong index.” That kind of thing.

It’s funnier than it sounds, and per their own benchmark (their numbers, ten prompts, not something I independently verified) it cuts roughly 65% of output tokens on average, ranging 22–87% depending on the prompt.

Here’s the honest part, straight from their own README: it only touches output tokens. Your input tokens and the model’s reasoning tokens are completely untouched — caveman mode doesn’t make Claude think in fewer words, just answer in fewer words. And the skill itself adds roughly 1,000–1,500 input tokens per turn to set up the caveman persona. On a workload that’s already terse — quick yes/no questions, short diffs — that overhead can make the whole thing net-negative on total tokens. You’re paying more up front than you save on the back end.

Treat this as a fun sidebar, not a context-management strategy. It’s genuinely useful if you want faster-to-scan responses, and it’s a good laugh in a stand-up demo. It is not going to move your monthly bill.

So which one should you actually run?

Straight answer, no hedging:

If you want the general shape of the takeaway: for most people, a lightweight proxy beats a heavyweight platform. There’s no prize for running the biggest tool in the shed if a YAML filter file does the exact same job with a fraction of the surface area to learn and break. Save the platform play for when you actually need the platform.

Next up: the other half of the token problem — MCP tools that manage what your agent reads from the codebase in the first place, instead of filtering what comes back out. That’s where lean-ctx’s read-side tooling and its competitors actually live, and it deserves its own post instead of a rushed paragraph at the bottom of this one.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
PoE Injectors and Switches for the Home Lab
Next Post
TIG: Telegraf + InfluxDB + Grafana

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts