The Email That Started This
You build a side project. It does something cute with addresses. Maybe a hiking app, maybe a delivery dashboard, maybe just a thing that turns a list of customer ZIPs into a map. You wire up Google Maps Geocoding because it’s “the obvious choice.” Three months later there’s an email about your usage tier, and your enthusiasm for the side project drops by 80%.
That’s the geocoding tax. Per-request pricing on commercial APIs makes total sense for a Fortune 500. It makes zero sense for the home labber and the indie dev. Honestly, even small businesses get hosed. The data behind 95% of these APIs is OpenStreetMap. You can run the same engine they’re running, on your own hardware, for the cost of one SSD and a weekend.
The tool is called Nominatim. It’s the geocoder powering openstreetmap.org itself. Forward geocoding (address → coordinates), reverse geocoding (coordinates → address), structured search. All of it. And running it is honestly less painful than people make it sound — as long as you don’t try to import the entire planet on a Raspberry Pi.
Full example: Working Compose file and config at github.com/KingPin/sumguy-examples/tree/main/self-hosting/nominatim-self-hosted-geocoding-server
What Nominatim Actually Is
Nominatim is a geocoder built on top of PostgreSQL + PostGIS, fed with OpenStreetMap data. When you hand it 1600 Pennsylvania Avenue, Washington, it tokenizes that, hits a bunch of search indexes, and gives you back a lat/lon plus the matched OSM object. When you hand it 38.8977, -77.0365 it walks the spatial index and returns the closest meaningful address.
It’s the same engine the public site at nominatim.openstreetmap.org uses. The public service is rate-limited to roughly 1 request per second, which is fine for a status page and useless for anything real. That’s the whole reason self-hosting exists.
Worth knowing: Nominatim is OSM-only. If the data isn’t in OpenStreetMap, Nominatim can’t find it. For most use cases — addresses in populated areas — that’s plenty. For obscure POIs and addresses in countries with sparse OSM coverage, it gets thinner. We’ll come back to that.
The Cost Math That Pushed Me Here
I won’t quote prices because they shift, but the directional shape is consistent. Commercial geocoding APIs charge somewhere in the neighborhood of a few dollars per thousand requests. Sounds tiny. Now imagine a hobby app with 5,000 active users, each producing 20 location lookups a month. That’s 100,000 requests. Now imagine you wrote a script that backfills addresses on a million old records.
Rough rule of thumb I use: if you’re doing more than ~50,000 lookups a month, the math has already swung toward self-hosting. The break-even is faster than people think because the recurring cost of a small VPS or a corner of your home server is essentially fixed, while API costs scale linearly with usage.
There’s a second cost too: latency. A round-trip to a commercial API is 100–300ms. A local Nominatim on the same LAN is 5–20ms. If you’re doing batch work, that matters more than the dollars.
Pick Your PBF: Planet vs Region
The most common reason people bounce off Nominatim is that they try to import the entire planet on the wrong hardware. Don’t do that. The full planet PBF from OpenStreetMap is around 80 GB compressed and balloons to 700 GB+ once Nominatim builds its indexes. The import takes days even on serious hardware.
You almost certainly don’t need that. Geofabrik publishes regional extracts at every level you could want:
- Whole continents (North America, Europe) — ~10–25 GB compressed, 100–300 GB imported
- Single countries (Germany, USA) — a few GB compressed
- Subregions (California, Bavaria, Greater London) — under a gigabyte
If you’re in the US and you only care about US addresses, the north-america-latest.osm.pbf extract is the sweet spot. It fits comfortably on a 1 TB NVMe with room to spare, and the import wraps in 6–12 hours on reasonable hardware. If you only need one state or region, smaller extracts import in under an hour.
We’ll dig into the hardware/RAM/disk math more deeply in the hardware sizing post. For now: pick the smallest extract that covers what you actually need. You can always import a bigger one later.
The Docker Setup (mediagis/nominatim)
The mediagis/nominatim image is the de-facto community Docker image for Nominatim. It bundles a working Postgres + Nominatim + a startup script that handles the import pipeline. Maintained, well-documented, sane defaults.
Here’s a working Compose stack that imports a regional extract:
services: nominatim: image: mediagis/nominatim:4.5 container_name: nominatim ports: - "8080:8080" environment: PBF_URL: https://download.geofabrik.de/north-america-latest.osm.pbf REPLICATION_URL: https://download.geofabrik.de/north-america-updates/ NOMINATIM_PASSWORD: ${NOMINATIM_PASSWORD} IMPORT_STYLE: full THREADS: 4 volumes: - nominatim-data:/var/lib/postgresql/16/main - nominatim-flatnode:/nominatim/flatnode shm_size: 1gb restart: unless-stopped
volumes: nominatim-data: nominatim-flatnode:And a .env next to it:
NOMINATIM_PASSWORD=please-change-me-for-realBring it up:
docker compose up -ddocker compose logs -f nominatimThe first time you start it, the container will download the PBF, set up Postgres, and run the import. This is the part that takes hours. Watch the logs — you’ll see phases like Downloading..., Importing..., Indexing..., Updating word counts.... When it finishes, you’ll see a healthy server listening on port 8080.
A few env vars worth knowing about:
IMPORT_STYLE—fullis the default and what you want.adminis smaller (only admin boundaries) and useless for most cases.THREADS— number of CPU threads to use during import. Higher = faster but more contended. 4 is a safe default for a 4–8 core box.REPLICATION_URL— Geofabrik’s diff feed for your region. Set this and Nominatim can pull incremental updates instead of re-importing.IMPORT_WIKIPEDIA— set totrueif you want Wikipedia importance scoring (better ranking on famous places). Adds time and disk.
What the Import Actually Does
Under the hood, the import is a multi-stage pipeline:
- Parse the PBF into Postgres staging tables
- Build the place table — the canonical “thing at a coordinate” table
- Build the search table — tokenized text indexes
- Build the word counts — frequency stats used for ranking
- Build the indexes — GiST spatial indexes, trigram indexes for fuzzy matching
This is CPU-bound during parsing, IO-bound during indexing, and RAM-hungry across the board. A few rules:
- NVMe matters more than CPU. A slow disk turns a 6-hour import into a 36-hour import. SATA SSD acceptable, NVMe ideal, spinning rust is suffering.
shm_size: 1gbis not optional. Postgres uses shared memory for sorts. Lower it and the import OOMs at random.- Don’t kill it. If the import dies halfway, you usually start over. There are checkpoint hooks but they’re fragile.
- No swap during import. Either disable swap or make sure you have enough RAM that swap never gets touched.
Hitting the API
Once the import is done, Nominatim exposes a REST API on port 8080. The endpoints match the public Nominatim site, so any client library written for nominatim.openstreetmap.org works against your local one — just point it at your URL.
# Forward geocodecurl "http://localhost:8080/search?q=1600+Pennsylvania+Ave+Washington&format=json"
# Reverse geocodecurl "http://localhost:8080/reverse?lat=38.8977&lon=-77.0365&format=json"
# Structured searchcurl "http://localhost:8080/search?street=Pennsylvania+Avenue&city=Washington&format=json"The JSON response shape includes lat, lon, display_name, importance (0–1 ranking score), address object with road/city/state/country/postcode, and boundingbox for the matched feature. Useful query parameters:
addressdetails=1— return the structuredaddressobjectlimit=N— cap the number of resultscountrycodes=us,ca— restrict to specific countrieszoom=18— for reverse geocoding, controls the granularity (18 = building, 10 = city, 3 = country)
Rate limit it yourself if you’re going to expose it. The defaults don’t include any throttling.
Updates: Keep It Fresh Without Re-Importing
OSM data changes daily. New buildings get mapped, addresses get corrected, roads get added. You don’t want to re-import the whole region every week.
Setting REPLICATION_URL in your Compose file enables update mode. The mediagis image includes a start.sh that supports an update mode. The simple approach is a periodic update via cron on the host:
# Run a replication update oncedocker exec nominatim sudo -u nominatim nominatim replication --once
# Or run it as a background daemon inside the containerdocker exec -d nominatim sudo -u nominatim nominatim replicationDaily diffs are usually plenty. Hourly is overkill unless you have a real reason. The diffs are small — single megabytes — and apply in seconds.
Putting Caddy In Front
You probably want this on a nice hostname with TLS, even if it’s only on the LAN. Caddy makes this trivial:
geocode.lan { reverse_proxy nominatim:8080}Pop that into a Caddy container on the same Docker network and you’re done. If you’re going to expose it to the internet — which honestly, you probably shouldn’t — at minimum add basic auth, an IP allowlist, or a real auth proxy in front. Public Nominatim instances get hammered by bots within hours of going live.
Things That Will Bite You
- The full planet on a 16 GB RAM box. Just don’t. Use a region. Even with 32 GB you’re going to suffer.
- The import is single-shot. If it dies halfway, you usually start over from scratch. Don’t reboot the host mid-import.
shm_sizematters. 1 GB is the practical minimum. Lower values cause silent OOMs during indexing.- Time zone bugs. Nominatim assumes UTC for replication timestamps. If your host is on local time, you can hit weird “data is from the future” errors. Use UTC on the host or use the Docker default.
- CPU throttling on consumer hardware. Mini PCs with weak cooling will thermal-throttle during the import. Watch the temps.
- Don’t expose it raw to the internet. Bots will scrape. Rate-limit at the proxy.
When Nominatim Is the Wrong Tool
Nominatim is excellent at structured address lookup and reverse geocoding. It’s mediocre at fuzzy autocomplete (the “as you type” search experience). If your use case is a search box where the user types pizz and expects Pizza Hut to pop up in 50ms, you want Photon — same OSM data, different indexing strategy, optimized for typeahead.
If you need geocoding against multiple data sources (OSM + government address files + GeoNames + custom POI databases), you want Pelias. It’s heavier to operate, but it’s the right tool when “OSM only” is a hard limitation.
If you do <10,000 lookups a month and you genuinely don’t care about the cost or the privacy, just use a commercial API. Self-hosting has a real ops cost — disk, monitoring, replication updates, occasional debugging. Don’t do it for vanity.
Going deeper on the comparison? See Nominatim vs Photon vs Pelias — same OSM data, very different tradeoffs.
Wrapping Up
One Docker image, one regional PBF, a few hours of import time, and you have your own geocoder. No API keys, no per-request pricing, no third-party seeing every coordinate your app touches. The whole thing fits on a Mini PC.
If you’re going to dig in further: the hardware sizing post breaks down planet vs region requirements with real numbers. If you’re a Home Assistant user, reverse geocoding for HA without phoning home wires this into your smart home setup. And if you want the full self-hosted maps stack — geocoding plus tile serving plus PostGIS — the combo guide puts it all together.
Your 2 AM self will appreciate not getting paged about a billing alert.
Related posts
- Nominatim vs Photon vs Pelias — when each one wins
- Reverse Geocoding for Home Assistant Without Phoning Home — privacy-friendly device tracking
- Nominatim Hardware Sizing — planet vs region, RAM and disk math
- The Full Self-Hosted Maps Stack: Nominatim + PostGIS + Tiles — combine with PostGIS for tiles
- PostGIS for Self-Hosted Mapping — spatial database fundamentals