Stirling-PDF: Stop Uploading Your Tax Returns to Sketchy Sites

You’ve Absolutely Used a Sketchy PDF Site

Admit it. You needed to merge two PDFs at 11 PM. You Googled “merge PDF free,” clicked the first result that wasn’t an ad (the second one), uploaded your documents, waited, downloaded the result, and closed the tab hoping nobody noticed.

Those documents were your lease agreement. Or a bank statement. Or, yes, tax forms with your full name, address, and SSN buried in page three.

ilovepdf.com and smallpdf.com aren’t evil. They’re just not yours. Their privacy policies exist. You have not read them. Neither have I. And that’s the problem.

Stirling-PDF fixes this. It’s a single Docker container that gives you a full PDF manipulation suite: merge, split, compress, rotate, OCR, redact, watermark, password protect, repair, convert to/from images, and more. Everything runs locally. Nothing leaves your server.

What’s In the Box

Stirling-PDF (formerly Stirling-Tools) is a Spring Boot web app with a clean UI and an API. The feature list is embarrassingly long for a free, self-hosted tool:

Merge / split / extract pages: combine multiple PDFs or pull out specific pages
Compress: reduce file size using Ghostscript under the hood
OCR: make scanned PDFs searchable with Tesseract
Rotate / reorder / remove pages: basic manipulation without losing your mind
Convert: PDF to images (PNG/JPEG/TIFF), images to PDF, HTML to PDF, Office docs to PDF
Redact: black out text or regions before sharing
eSign: add signature fields, sign documents
Password protect / unlock: standard PDF encryption
Watermark: text or image overlays
Repair: fix corrupted PDFs (works surprisingly often)
Metadata editor: strip or edit PDF metadata
Compare PDFs: visual diff between two versions

That’s not a features list, that’s a replacement for a $30/month Adobe subscription.

Spinning It Up

One Compose file. Done.

services:
  stirling-pdf:
    image: stirlingtools/stirling-pdf:latest
    container_name: stirling-pdf
    restart: unless-stopped
    ports:
      - "8080:8080"
    volumes:
      - ./stirling-config:/configs
      - ./stirling-logs:/logs
      - ./stirling-extras:/usr/share/tessdata  # OCR language packs
      - ./stirling-pipeline:/pipeline           # optional automation pipeline
    environment:
      - DOCKER_ENABLE_SECURITY=false
      - INSTALL_BOOK_AND_ADVANCED_HTML_OPS=false
      - LANGS=en_GB

mkdir -p stirling-config stirling-logs stirling-extras stirling-pipeline
docker compose up -d

Hit http://your-server:8080 and you’re in. The UI is clean enough that you won’t feel bad handing it to a non-technical family member who keeps asking you to “fix a PDF.”

Persistent Volume Considerations

The ./stirling-config mount is where your settings live. If you nuke the container, your config survives. The ./stirling-extras mount is where Tesseract language packs go, more on that below.

/pipeline is optional and used for automation workflows (chained operations on watched folders). Useful for bulk processing, skip it for casual use.

OCR Language Packs

Out of the box you get English. If you need German, French, Japanese, or anything else, you need to drop Tesseract .traineddata files into ./stirling-extras.

# Download a language pack — example: German
wget -O ./stirling-extras/deu.traineddata \
  https://github.com/tesseract-ocr/tessdata/raw/main/deu.traineddata

# Multiple languages at once
for lang in fra spa por; do
  wget -O ./stirling-extras/${lang}.traineddata \
    https://github.com/tesseract-ocr/tessdata/raw/main/${lang}.traineddata
done

After dropping in the files, restart the container and the new languages show up in the OCR dropdown. No rebuild, no config change.

OCR quality is “good enough for scanned documents that weren’t printed by a dot-matrix printer in 1994.” It’s Tesseract, managed expectations apply.

File Size Limits

By default, Spring Boot caps uploads at 50MB. For most PDF work that’s fine. If you’re trying to OCR a 400-page architectural drawing package, you’ll hit the wall.

Bump it in ./stirling-config/settings.yml:

system:
  maxUploadSize: 500  # MB

Restart the container. Done.

Compression is handled by Ghostscript internally, you pick a quality preset (screen, ebook, printer, prepress) and Stirling calls out to gs. The “ebook” preset is the sweet spot for most documents: good quality, meaningfully smaller files.

Reverse Proxy + Auth

If you’re exposing this to your LAN or beyond, you’ll want a reverse proxy in front. Stirling-PDF doesn’t have built-in user accounts in the default config, it’s “everyone on the network can use it.”

For home use behind Tailscale or a VPN, that’s probably fine. For anything internet-facing, either:

Option A: Basic auth via Nginx/Caddy

# Caddyfile snippet
stirling.yourdomain.com {
    basicauth {
        your_user JDJhJDE0JGhIZ2l...  # bcrypt hash
    }
    reverse_proxy stirling-pdf:8080
}

Option B: Enable Stirling’s built-in security

Set DOCKER_ENABLE_SECURITY=true in your Compose file. This enables Spring Security with a default admin account. Check the logs on first boot for the generated credentials, or set them via environment variables:

environment:
  - DOCKER_ENABLE_SECURITY=true
  - SECURITY_INITIALLOGIN_USERNAME=admin
  - SECURITY_INITIALLOGIN_PASSWORD=changeme_please

With security enabled you get user management, per-user settings, and the ability to restrict which tools are accessible. More overhead, but appropriate if you’re sharing it with people who aren’t you.

Paperless-ngx Integration

You’re probably already running Paperless-ngx if you’re the kind of person reading this article. The two tools complement each other without needing explicit integration:

Stirling-PDF handles pre-processing: compress that 8MB scan, OCR it, clean up orientation
Paperless-ngx handles storage, tagging, and search

Drop the processed PDF into your Paperless consume folder and let it do its thing. The workflow is: scan → Stirling (compress + OCR) → Paperless consume folder → indexed and searchable forever.

If you want to get fancy, Stirling’s pipeline feature can watch a folder and auto-apply a processing chain. Point it at a “raw scans” directory, configure compress + OCR as the pipeline steps, output to your Paperless consume folder. You’ve just automated document ingestion without writing a single line of code.

Resource Use Under Load

Stirling-PDF is a Java app. It uses memory like a Java app. At idle, expect 300-500MB RAM. Under load, especially OCR on multi-page documents, you’ll see spikes to 1-2GB and meaningful CPU usage (Tesseract is single-threaded per page).

For a home server or a modest VPS, this is fine. For a Raspberry Pi 3: maybe not. A Pi 4 with 4GB handles it without complaint.

Compression jobs are fast. OCR on a 20-page scanned PDF takes 30-60 seconds on modern hardware. Conversion tasks (PDF to images) are nearly instant.

If you’re processing large batches, don’t queue 50 OCR jobs simultaneously. It’ll work, it’ll just be slow and your server will breathe heavy for a while.

When You’d Still Pay Adobe

Stirling-PDF is excellent. It’s not perfect. There are cases where the paid tools still win:

PDF/A archival compliance, If you need legally archivable PDF/A-1b or PDF/A-2a output for regulatory reasons, Stirling’s compliance with specific archival standards is not guaranteed. LibreOffice can produce PDF/A output; for strict compliance workflows, test carefully.

Advanced fillable forms, Creating complex PDF forms with conditional logic, JavaScript validation, and digital signature fields is outside Stirling’s scope. It can fill and flatten existing forms, not author them.

Accessibility (WCAG/PDF/UA tagging), Producing fully tagged, accessible PDFs for institutional publishing requires Adobe Acrobat or specialized tools. Stirling doesn’t touch the tag tree.

Enterprise audit trails, If you need cryptographic signing, timestamping, and compliance logs for legal documents, you’re in enterprise PDF tooling territory.

For the other 95% of things a normal person needs to do with a PDF? Stirling has you covered.

The Privacy Argument Is Simple

You wouldn’t upload your medical records to a random website to print them. A PDF with your SSN or financial statements is the same thing. The “free online tool” model is built on processing your data, at minimum for product improvement, potentially for more.

Self-hosting isn’t paranoia. It’s recognizing that free tools have costs that aren’t listed on the pricing page.

Stirling-PDF costs you one Docker container, maybe 500MB of RAM, and the 10 minutes it took to read this article. In exchange, your documents stay on your hardware, behind your firewall, processed by code you can inspect.

Your 2 AM tax document panic mode just got a lot less sketchy.

Quick Reference

Task	Stirling-PDF?
Merge PDFs	Yes
Split / extract pages	Yes
OCR scanned documents	Yes (Tesseract)
Compress for email	Yes
Redact sensitive info	Yes
Password protect	Yes
Convert PDF ↔ images	Yes
Repair corrupted PDF	Yes (often)
Create fillable forms	No
PDF/A compliance	Limited
Enterprise signing	No

# Get started in 60 seconds
mkdir stirling && cd stirling
curl -o docker-compose.yml https://raw.githubusercontent.com/Stirling-Tools/Stirling-PDF/main/docker/docker-compose.yml
docker compose up -d
# Open http://localhost:8080

Done. No more uploading your lease to a website called “PDFsupertools.xyz.”

Stirling-PDF: Stop Uploading Your Tax Returns to Sketchy Sites

You’ve Absolutely Used a Sketchy PDF Site

What’s In the Box

Spinning It Up

Persistent Volume Considerations

OCR Language Packs

File Size Limits

Reverse Proxy + Auth

Paperless-ngx Integration

Resource Use Under Load

When You’d Still Pay Adobe

The Privacy Argument Is Simple

Quick Reference

Responses from around the web

Discussion

Related Posts

Karakeep: Self-Hosted Bookmarks With AI Tagging

Claude Code + SearXNG: Private Web Search

Blog Comments: Self-Host or SaaS?

Immich vs PhotoPrism: Escape Google Photos Without Losing Your Mind

Stirling-PDF: Stop Uploading Your Tax Returns to Sketchy Sites

You’ve Absolutely Used a Sketchy PDF Site

What’s In the Box

Spinning It Up

Persistent Volume Considerations

OCR Language Packs

File Size Limits

Reverse Proxy + Auth

Paperless-ngx Integration

Resource Use Under Load

When You’d Still Pay Adobe

The Privacy Argument Is Simple

Quick Reference

Related Reading

Responses from around the web

Discussion

Related Posts

Karakeep: Self-Hosted Bookmarks With AI Tagging

Claude Code + SearXNG: Private Web Search

Blog Comments: Self-Host or SaaS?

Immich vs PhotoPrism: Escape Google Photos Without Losing Your Mind