Anubis: Anti-AI-Crawler Proof-of-Work

Your Content Is Being Stolen. Here’s How To Fight Back.

It’s 2026. Your blog posts, your code snippets, your carefully-written tutorials, they’re being vacuumed into training datasets by the dozen. Some polite, with a User-Agent that says ClaudeBot or GPTBot. Most not even trying to hide. You can block them at robots.txt, but that’s like putting a polite sign on a chain-link fence. A smarter thief just walks around it.

What if instead of a sign, you put up a math problem? One that humans solve instantly (with their browser doing the work in the background) but that makes AI scrapers’ cost curve go vertical?

That’s the idea behind Anubis, a proof-of-work gating system that sits in front of your site, challenges bots with computational puzzles, and lets real humans through without breaking a sweat. No CAPTCHA farms, no email gates, just: “Solve this hash, or get out.”

This is not theoretical. It’s a real tool, it works, and for a self-hosted blog in 2026, it’s worth understanding.

Why PoW Gating Matters Now

Three reasons you should care:

1. Your data pays someone else’s bills. Training data is valuable. Every AI scraper is betting the math to extract your content costs less than what they’ll make selling the model. Right now, they’re winning that bet, because there’s no cost to them. PoW changes that arithmetic.

2. robots.txt is theater. GPTBot respects it. Most others don’t. And even the polite ones can be spoofed with a proper User-Agent header. A robots.txt file is like a velvet rope in a store with no security guard. Anubis is the security guard.

3. It actually gets better with regulation. As more sites deploy PoW gating, AI companies will need to either pay (through Proof-of-Work CDN providers) or crawl less aggressively. That’s regulation through code, and it scales.

The downside? False positives. We’ll get to that.

How Anubis PoW Works (The 30-Second Version)

When a bot (or human) requests your site:

Your reverse proxy (Caddy/Nginx) intercepts the request
It returns a lightweight PoW challenge: “hash this data until you get a result starting with five zeros”
The client solves it (humans’ browsers do this in the background; bots’ CPUs work overtime)
The solution is verified and the real request goes through
Repeat on interval to prevent session hijacking

The work is tunable. Set it light and you barely notice. Crank it up and a bot’s cost-per-page explodes from ~$0.001 to ~$0.50. At scale, that destroys the ROI on scraping.

The beauty: humans don’t see anything. Their browser does the work without blocking the page load.

Deploying Anubis with Caddy

Here’s the practical setup. You’re running a self-hosted blog (or any site) behind Caddy, and you want to gate AI scrapers.

Step 1: Set Up the Anubis Reverse Proxy

Anubis runs as a sidecar service. It sits between your CDN/firewall and your actual web server.

version: '3.9'

services:
  anubis:
    image: ghcr.io/techarohq/anubis:latest
    container_name: anubis-proxy
    ports:
      - "8080:8080"  # HTTP listener
      - "8443:8443"  # HTTPS listener (optional, use Caddy's TLS)
    environment:
      # Upstream target (your actual blog)
      UPSTREAM_URL: "http://blog:3000"

      # PoW difficulty (0-30, default: 16)
      # 16 = ~100ms for decent laptop, ~5-10s for bot
      POW_DIFFICULTY: "18"

      # Whitelist bypass (commas separated)
      # Real browsers get free passes via JWT or session cookie
      WHITELIST_BYPASS_ENABLED: "true"
      WHITELIST_COOKIE_NAME: "anubis-pass"
      WHITELIST_TTL_SECONDS: "3600"  # 1 hour free pass after solving once

      # Bot detection rules (exact Match on User-Agent)
      BOT_RULES: |
        {
          "always_challenge": ["GPTBot", "ClaudeBot", "PerplexityBot"],
          "never_challenge": ["Googlebot", "Bingbot", "Slurp"],
          "always_block": ["MJ12Bot", "DotBot"]
        }

      # Logging
      LOG_LEVEL: "info"
      LOG_FILE: "/var/log/anubis/access.log"

    volumes:
      - anubis_logs:/var/log/anubis

    restart: unless-stopped
    networks:
      - sumguy

  blog:
    image: sumguy-astro:latest
    container_name: sumguy-blog
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: "production"
    restart: unless-stopped
    networks:
      - sumguy

volumes:
  anubis_logs:

networks:
  sumguy:
    driver: bridge

Spin it up:

docker compose up -d

Anubis is now running on localhost:8080. It will forward all requests to your blog at blog:3000 after PoW challenge.

Step 2: Route Traffic Through Anubis with Caddy

Your Caddy config points to the Anubis proxy instead of the blog directly:

sumguy.com {
  # Point to Anubis, not directly to the blog
  reverse_proxy localhost:8080 {
    # Preserve headers for bot detection
    header_up User-Agent "{http.request.header.User-Agent}"
    header_up X-Forwarded-For "{http.request.header.X-Forwarded-For}"

    # Long timeout for PoW solving on slow connections
    timeout 30s
  }

  # Optionally, log raw User-Agents for tuning
  log {
    output file /var/log/caddy/access.log {
      roll_size 100MiB
      roll_keep 5
    }
    format json
  }

  # Security headers (unchanged)
  header X-Content-Type-Options nosniff
  header X-Frame-Options DENY
  header Referrer-Policy no-referrer
  header Permissions-Policy "geolocation=(), microphone=(), camera=()"
}

Reload Caddy:

caddy reload

Now traffic flows: Browser/Bot → Caddy → Anubis → Blog

Tuning Bot Rules & False Positives

The GPTBot Problem

GPTBot (OpenAI’s crawler) is smart. It respects robots.txt, uses honest User-Agents, and comes with good intentions. But OpenAI’s terms say they’ll scrape anyway if you don’t explicitly opt out (via robots.txt or x-robots-tag).

You have two choices:

Let GPTBot through (no PoW): they’ll train on your content, you get attribution in the LLM’s training data. Some people call that marketing.
Challenge GPTBot (medium PoW): make it expensive but not impossible. They’ll sample less aggressively but still crawl.
Block GPTBot entirely (highest PoW): they give up. No training, no attribution.

Here’s a moderate config:

BOT_RULES: |
  {
    "always_challenge": {
      "GPTBot": 20,
      "ClaudeBot": 20,
      "PerplexityBot": 18
    },
    "never_challenge": ["Googlebot", "Bingbot", "Slurp", "Yandex"],
    "always_block": ["MJ12Bot", "DotBot", "SemrushBot"]
  }

The numbers are difficulty levels. "GPTBot": 20 means GPTBot gets a PoW challenge with difficulty 20 (harder than the default). "never_challenge" lets search engines index normally (they’re indexing, not scraping for training).

False Positives: When Real Browsers Get Challenged

Here’s the annoying part: legitimate tools that aren’t browsers will hit PoW walls.

Common false positives:

Feed readers (Feedly, Inoreader): they’re checking for RSS updates, not training data, but they look like bots
Monitoring tools (Uptime Kuma, Pingdom): they ping your site; PoW will fail
Slack link previews: Slack’s crawler extracts the OG image and title before your user sees it
Email clients: some email apps pre-fetch links to show rich previews

Solutions:

1. Whitelist by User-Agent (surgical):

BOT_RULES: |
  {
    "never_challenge": [
      "Googlebot",
      "Slurp",
      "bingbot",
      "Feedly",
      "Inoreader",
      "Slack",
      "facebookexternalhit",
      "Twitterbot"
    ]
  }

2. Whitelist by IP (for your own tools):

WHITELIST_IPS: "10.0.0.5, 192.168.1.100"  # Uptime Kuma, your status page

3. Use session cookies (best UX):

Once a human solves the PoW, they get a cookie valid for 1 hour. On refresh, no challenge. Bots don’t preserve cookies across sessions, so they re-solve every time (expensive).

WHITELIST_COOKIE_NAME: "anubis-solved"
WHITELIST_TTL_SECONDS: "3600"

The tradeoff: Tighter rules = fewer false positives = easier discovery by bots. Looser rules = fewer bots = happy readers.

Monitoring & Observability

Keep logs. You’ll want to know which bots are hitting you hardest and whether your PoW is actually slowing them down.

Check Anubis logs:

docker logs anubis-proxy | grep -E "bot|challenge|solved|failed"

Sample output (hypothetical):

2026-11-27T10:15:22Z INFO request=GET:/blog/anubis-post user_agent=GPTBot difficulty=20 solved=true latency_ms=4203
2026-11-27T10:15:45Z INFO request=GET:/blog/anubis-post user_agent=Mozilla/5.0 difficulty=0 solved=true latency_ms=12
2026-11-27T10:16:03Z WARN request=GET/ user_agent=DotBot challenge_failed=true ip=203.0.113.45 attempts=3

Read the story: GPTBot took 4+ seconds (the PoW), real browser took 12ms (browser cache), DotBot failed 3 times and gave up. Working as designed.

Set up metrics export (Prometheus optional, but useful):

PROMETHEUS_ENABLED: "true"
PROMETHEUS_PORT: "9090"

Then scrape localhost:9090/metrics from your Prometheus instance.

Edge Cases & Gotchas

1. CDN Caching Breaks PoW Challenges

If you’re using Cloudflare or another CDN, they might cache PoW responses. Don’t. Disable caching on Anubis endpoints:

# Caddyfile
sumguy.com {
  reverse_proxy localhost:8080 {
    # Disable caching for PoW responses
    header Cache-Control "no-store, no-cache, must-revalidate"
  }
}

2. Mobile Users on Slow Networks

PoW can take longer on older phones or 3G. Set difficulty conservatively (16-18). Test on a throttled connection.

# Chrome DevTools → Network tab → Slow 3G
# Verify page load still feels responsive (<2s before content visible)

3. Legitimate Scrapers (Wayback Machine, Archive.org)

Wayback Machine’s crawler is well-intentioned. But it’s a scraper. You have to pick: let them preserve your site for posterity, or block them.

never_challenge: ["archive.org_bot", "ia_archiver"]
# OR
always_challenge: ["archive.org_bot"]  # Medium PoW, they'll sample less

4. China & Great Firewall

If you have readers in mainland China, PoW adds latency. High difficulty (>22) might make the site unusable over GFW. Keep it at 16-18 if you expect international traffic.

The Decision: Is Anubis Right for You?

Use Anubis if:

Your content is evergreen and valuable: tutorials, code, research, opinions that AI companies want to scrape
You’re okay with slight latency: PoW adds 100-500ms on first visit per session
You run your own infrastructure: Anubis is self-hosted (no third-party dependency)
You can tune bot rules: false positives need ongoing tweaking

Skip Anubis if:

You want a 100% open site: PoW is a speedbump, not a wall
Your audience is mostly mobile: PoW hits mobile harder
You have zero DevOps bandwidth: it’s one more service to monitor
You’re on a shared host: you can’t install custom reverse proxies

The Honest Take

Anubis doesn’t stop scraping. It doesn’t kill the problem. What it does is raise the cost high enough that bots become pickier about which sites to scrape. If a bot can get your content for $0.001/page or someone else’s for free, they’ll pick someone else. That’s the goal.

In 2026, as more sites deploy PoW gating, this becomes an arms race. Smarter bots will optimize their PoW solvers. You’ll crank up difficulty. It’ll get weird. But right now, today, Anubis gives you leverage where you had none before.

Deploy it. Monitor it. Tune it. Your 2 AM self will appreciate knowing your content stays yours a little bit longer.

Links & Further Reading

Anubis GitHub: https://github.com/TecharoHQ/anubis (by Xe Iaso / Techaro, actively maintained)
Anubis docs: https://anubis.techaro.lol
Proof of Work explainer: https://en.wikipedia.org/wiki/Proof_of_work (Bitcoin uses the same math)
robots.txt & AI: Add User-agent: * / Disallow: / / User-agent: ChatGPT-User / Allow: / to robots.txt for selective blocking (GPT respects this)
Caddy reverse proxy docs: https://caddyserver.com/docs/caddyfile/directives/reverse_proxy

Anubis: Anti-AI-Crawler Proof-of-Work

Your Content Is Being Stolen. Here’s How To Fight Back.

Why PoW Gating Matters Now

How Anubis PoW Works (The 30-Second Version)

Deploying Anubis with Caddy

Step 1: Set Up the Anubis Reverse Proxy

Step 2: Route Traffic Through Anubis with Caddy

Tuning Bot Rules & False Positives

The GPTBot Problem

False Positives: When Real Browsers Get Challenged

Solutions:

Monitoring & Observability

Edge Cases & Gotchas

1. CDN Caching Breaks PoW Challenges

2. Mobile Users on Slow Networks

3. Legitimate Scrapers (Wayback Machine, Archive.org)

4. China & Great Firewall

The Decision: Is Anubis Right for You?

The Honest Take

Links & Further Reading

Responses from around the web

Discussion

Related Posts

Claude Code + SearXNG: Private Web Search

KV Cache Quantization: Free LLM Context, Almost

NextDNS vs Self-Hosted: When SaaS Wins

Self-Hosted CAPTCHA Alternatives in 2026

Anubis: Anti-AI-Crawler Proof-of-Work

Your Content Is Being Stolen. Here’s How To Fight Back.

Why PoW Gating Matters Now

How Anubis PoW Works (The 30-Second Version)

Deploying Anubis with Caddy

Step 1: Set Up the Anubis Reverse Proxy

Step 2: Route Traffic Through Anubis with Caddy

Tuning Bot Rules & False Positives

The GPTBot Problem

False Positives: When Real Browsers Get Challenged

Solutions:

Monitoring & Observability

Edge Cases & Gotchas

1. CDN Caching Breaks PoW Challenges

2. Mobile Users on Slow Networks

3. Legitimate Scrapers (Wayback Machine, Archive.org)

4. China & Great Firewall

The Decision: Is Anubis Right for You?

The Honest Take

Links & Further Reading

Related Reading

Responses from around the web

Discussion

Related Posts

Claude Code + SearXNG: Private Web Search

KV Cache Quantization: Free LLM Context, Almost

NextDNS vs Self-Hosted: When SaaS Wins

Self-Hosted CAPTCHA Alternatives in 2026