Skip to content
Go back

Nominatim Diff Replication

By SumGuy 12 min read
Nominatim Diff Replication

Your Geocoder Is Living in the Past

Your friend just moved. New apartment, new street, whole new development that didn’t exist eighteen months ago. You drop the address into your self-hosted Nominatim instance — the one you spent a weekend setting up, very proud of yourself — and it returns nothing. Or worse, it confidently returns the wrong thing.

You check OSM. The address has been there for four months. The building was mapped, the road was added, the housenumbers were entered by a diligent local mapper who probably does this for fun. All of it is in OpenStreetMap. It’s just not in your OpenStreetMap, because you imported a PBF six months ago and never touched it again.

This is the stale-data problem, and it’s the most common operational miss when people first set up Nominatim. The import is a one-time snapshot. The world is not. OSM receives millions of edits every week — new buildings, address corrections, road renames, business closures, the whole living map keeps changing. If you don’t pull those changes in, your geocoder drifts further behind reality with each passing day.

The good news: you don’t need to re-import. OSM publishes incremental diffs, there are good tools for applying them, and Nominatim has a built-in pipeline for handling updates. Setting it up takes an afternoon, not a weekend.

Full example: Replication scripts and systemd units at github.com/KingPin/sumguy-examples/tree/main/self-hosting/nominatim-diff-replication

How OSM Publishes Changes

OpenStreetMap maintains a change log called the OsmChange format.osc.gz files that describe what happened to the map over a time window. An entry in a diff looks roughly like this: a node was modified, a way was created, a relation was deleted. Every object has a version number, a timestamp, and a changeset ID. The diff is the delta between two consecutive snapshots of the planet.

OSM publishes these diffs at three cadences:

Geofabrik, which provides the regional extracts most people use for their initial import, also publishes regional diffs that only contain changes within a geographic boundary. If you imported north-america-latest.osm.pbf, you can pull from download.geofabrik.de/north-america-updates/ instead of the planet diff feed — smaller files, less processing, same result for your region.

The state file is what keeps everything in sync. Each replication server publishes a state.txt that records the current sequence number and timestamp. Your tooling reads this to know what it’s already applied and what comes next. Miss a sequence number, and you need to catch up. Drift far enough, and minutely diffs are no longer available — they expire after a few months — and you’re forced to switch to a coarser cadence or re-import.

pyosmium-replag and pyosmium-up-to-date

The standard tooling for applying OSM diffs is pyosmium, specifically the pyosmium-up-to-date and pyosmium-replag commands that ship with it. If you’re running the mediagis/nominatim Docker image, pyosmium is already installed.

pyosmium-replag measures how far behind you are:

Terminal window
pyosmium-replag -v \
--server https://download.geofabrik.de/north-america-updates

Run this from the directory containing your replication state file, or inside the container where Nominatim has already set up its state. It hits the replication server, compares sequence numbers, and reports how many minutes of lag you’re sitting on. Run this before you set anything up — it tells you whether your initial import state is even valid, and it’ll give you an immediate reality check on drift. If it comes back with “2,874 minutes of lag,” you’ve got some catching up to do.

pyosmium-up-to-date does the actual downloading and applying:

Terminal window
pyosmium-up-to-date -v \
--server https://download.geofabrik.de/north-america-updates \
/path/to/your/region.osm.pbf

This is the low-level tool. It’s fine to run directly, but you don’t need to call it manually for Nominatim — the nominatim replication subcommand wraps it in a pipeline that also handles the database update side. Worth knowing pyosmium exists anyway because the replag check is useful for monitoring and the up-to-date command is good for troubleshooting when Nominatim’s higher-level tooling behaves weirdly.

The Nominatim Replication Subcommand

Nominatim has a built-in subcommand that handles the full update pipeline: downloading the diff, applying it to the raw OSM data, then running the indexing pass that updates the search indexes. Using nominatim replication is almost always preferable to calling pyosmium directly, because the indexing step is the part that actually makes changes show up in query results. Apply a diff without indexing and nothing in the API changes.

The three flags you need to know:

Terminal window
# One-time setup — writes the initial state file so Nominatim knows where to start
nominatim replication --init
# Apply one batch of diffs and exit — good for cron/timer usage
nominatim replication --once
# Background daemon mode — loops forever, applying diffs on a schedule
nominatim replication

Inside the Docker container, you’d run this as the nominatim user:

Terminal window
docker exec nominatim sudo -u nominatim nominatim replication --init
docker exec nominatim sudo -u nominatim nominatim replication --once

The --init step is critical and easy to forget. It reads your Nominatim database to figure out what sequence number your import corresponds to, then writes the state file to match. Without it, the --once call doesn’t know where to start and either errors out or — worse — silently re-applies old diffs.

The replication URL comes from your Compose environment or the nominatim.conf settings file. If you set REPLICATION_URL in your Compose env, it’s already configured. If not, you can set it in /nominatim/nominatim.conf:

Terminal window
# check what URL is configured
docker exec nominatim grep -i replication /nominatim/nominatim.conf

Catching Up After You’ve Fallen Behind

Here’s a scenario: your replication timer breaks, you don’t notice, three weeks pass, and now you’re behind by tens of thousands of minutely diffs. The minutely feed probably still has them — OSM keeps diffs available for a couple of months — but applying them one-by-one at minutely cadence will take days.

The fix is to switch to daily diffs temporarily, let them catch you up in large batches, then switch back to hourly once you’re within a day or two of current. Edit the replication URL to point at the daily feed:

Terminal window
# Daily planet diffs
https://planet.openstreetmap.org/replication/day/
# Geofabrik regional daily (preferred if you're on a regional extract)
https://download.geofabrik.de/north-america-updates/

Geofabrik’s regional feed is actually published at daily cadence by default and doesn’t have a separate minutely endpoint — which makes the “switch to daily when behind” advice even simpler for most self-hosters: you’re already on daily.

After the catch-up, check your lag again:

Terminal window
docker exec nominatim pyosmium-replag -v \
--server https://download.geofabrik.de/north-america-updates

Under a day of lag? You’re back to operational. The daily diff runs you to within 24 hours of current, and that’s fine for geocoding. No one expects their geocoder to reflect an address change that was committed to OSM forty minutes ago.

Do You Actually Need Minutely?

Almost certainly not. Here’s the honest breakdown:

Minutely makes sense when you’re running a service where freshness genuinely matters — emergency services routing, live disaster response mapping, a mapping app where contributors need to see their own edits reflected quickly. This is not a home lab use case.

Hourly is the upper bound of what most self-hosters should configure. It keeps you within an hour of current, which is more than good enough for addresses, businesses, and road data. The processing overhead is minimal — one batch per hour, usually finishes in seconds.

Daily is the practical default for solo self-hosters. Most of what changes in OSM on any given day is not visible in geocoding results anyway — tag changes, relation edits, wiki cleanup. The stuff that matters for address lookups — new buildings, new road names, new housenumbers — shows up in daily diffs within 24 hours, and that’s fast enough.

If you’re running a Geofabrik regional extract, you’re already locked to daily updates whether you want to be or not. Geofabrik doesn’t publish minutely or hourly regionals. For hourly or minutely you’d have to switch to the full OSM planet replication feed and filter it yourself, which is more complexity than you want.

Daily it is. Run it every morning at 3 AM, done.

Monitoring Lag with the Status Endpoint

Nominatim exposes a status endpoint that includes replication lag:

Terminal window
curl -s "http://localhost:8080/status.php?format=json" | python3 -m json.tool

The relevant fields:

"replication_acquire_age": 18345,
"replication_replication_age": 18345

The values are in seconds. Divide by 3600 to get hours. If replication_replication_age keeps growing across multiple checks, your updates have stopped running. If it’s consistently above your expected cadence — say, over 90,000 seconds on a daily-update setup — something is wrong.

A quick monitoring check you can drop into any alerting stack:

Terminal window
LAG=$(curl -s "http://localhost:8080/status.php?format=json" \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('replication_replication_age', d.get('replication_delay', 0)))")
if [ "$LAG" -gt 172800 ]; then
echo "ALERT: Nominatim replication lag is ${LAG}s (over 48h)"
exit 1
fi
echo "OK: lag is ${LAG}s"

172,800 seconds is 48 hours — reasonable threshold for a daily-update setup. Wire this into Prometheus Blackbox Exporter, a simple cron that emails on failure, or whatever you’re running for alerting.

Cron vs systemd Timer: Use the Timer

The traditional cron approach works, but systemd timers are better for this job. Here’s why: systemd gives you OnFailure=, which means you can get notified when the update fails instead of just silently not happening. Cron doesn’t have this. You’ll find out your updates broke when your friend’s apartment still doesn’t geocode three weeks later.

Create two files — the service and the timer:

/etc/systemd/system/nominatim-replication.service
[Unit]
Description=Nominatim OSM replication update
After=docker.service
Requires=docker.service
OnFailure=nominatim-replication-failure@%n.service
[Service]
Type=oneshot
User=root
ExecStart=/usr/bin/docker exec nominatim \
sudo -u nominatim nominatim replication --once
StandardOutput=journal
StandardError=journal
/etc/systemd/system/nominatim-replication.timer
[Unit]
Description=Run Nominatim replication daily
[Timer]
OnCalendar=*-*-* 03:00:00
RandomizedDelaySec=900
Persistent=true
[Install]
WantedBy=timers.target

RandomizedDelaySec=900 spreads the start time by up to 15 minutes — good practice so you’re not hammering the Geofabrik servers at exactly 3:00:00 AM UTC alongside every other instance.

Persistent=true means if the system was off at 3 AM, the timer fires as soon as it boots. You won’t silently skip days just because you rebooted.

For the failure notification, create a simple failure handler unit:

/etc/systemd/system/[email protected]
[Unit]
Description=Notify on Nominatim replication failure
[Service]
Type=oneshot
ExecStart=/usr/bin/curl -s \
"https://ntfy.sh/your-topic" \
-H "Title: Nominatim replication failed" \
-d "Replication failed on %i — check journalctl -u nominatim-replication.service"

Swap in ntfy.sh, Gotify, a webhook to Slack, or a curl to Healthchecks.io — whatever you already use for alerting. The point is you find out the same day, not three weeks later.

Enable and start it:

Terminal window
systemctl daemon-reload
systemctl enable --now nominatim-replication.timer
systemctl status nominatim-replication.timer

Backup Before You Update

This one sounds paranoid until it happens to you once. Bad OSM diffs are rare but not mythical. The OSM editing community is careful, but every few months someone runs a bulk import that goes sideways or a bot goes rogue and mangles a region’s worth of data before the moderators catch it and revert it. If you apply a bad diff, your data is corrupted until you either apply the revert diffs or roll back to a backup.

Since Nominatim’s data lives in a Postgres volume, the backup is straightforward:

Terminal window
# Quick logical backup before a catch-up run
docker exec nominatim pg_dump -U nominatim nominatim \
| gzip > /backups/nominatim-$(date +%Y%m%d).sql.gz

For daily operational use, a snapshot of the Docker volume is faster and doesn’t require the database to be idle. But if you’re doing a large catch-up after being behind for weeks, do a proper dump first. You don’t want to replay two weeks of catch-up because a bad diff showed up in hour three of your twelve-hour catch-up run.

Once you’re running steady daily updates, bad diffs that survive OSM moderation are essentially unheard of. Don’t let the backup step become a thing that blocks you from setting this up — just make sure you have something to roll back to.

Wrapping Up

The import was the hard part. Keeping it fresh is just a timer and a script. Set up nominatim replication --init once, wire up the systemd timer to run --once daily, add the status endpoint check to whatever you use for monitoring, and you’re done. Your geocoder stays within 24 hours of current OSM data indefinitely without touching the import again.

The stale data problem only bites you if you ignore it. Run the replag check right now if you’ve had an instance running for more than a month — you might be more behind than you think.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Boundary vs Teleport

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts