The wget That Ate the Weekend
It goes like this. You find OpenStreetMap. You find out the data is free. You find the downloads page. You see a big friendly link that says planet-latest.osm.pbf. You run wget. You go make coffee.
Six hours later you come back. The download is sitting at 43% — thirty-something gigabytes of a file you are never going to fully use — and it dawns on you: you needed Texas. Maybe California. You work at a company that delivers tacos in Austin and nobody at this company has ever once cared what a road in Kyrgyzstan looks like.
This is the OSM beginner tax. Everyone pays it once. The planet file is seductive because it promises everything, and “everything” is 80 GB of compressed XML-turned-binary that will fully inflate to somewhere north of 700 GB once your geocoder builds its indexes. On a consumer NVMe that import takes days. On spinning rust it takes geological time.
Here’s the thing: almost nobody needs the planet file. You need a region. You probably need a pretty small region. And OpenStreetMap’s ecosystem has excellent tooling for this — you just have to know where to look.
Why planet.osm Is the Wrong Default
The planet PBF is a full snapshot of every node, way, and relation in OpenStreetMap, worldwide, as of the last weekly update. It’s the canonical truth. It is also spectacularly overkill for 99% of self-hosted deployments.
The numbers make the case:
| Extract | Compressed size | Approximate indexed size |
|---|---|---|
| planet-latest.osm.pbf | ~80 GB | 700 GB+ |
| north-america-latest.osm.pbf | ~13 GB | ~120 GB |
| us-latest.osm.pbf | ~10 GB | ~90 GB |
| texas-latest.osm.pbf | ~600 MB | ~6 GB |
| Austin, TX (BBBike) | ~30 MB | ~300 MB |
Take the planet: 80 GB download, days to import, terabyte-class storage requirement, and after all that you’re serving geocoding requests for Bhutan. Versus pulling the Texas extract from Geofabrik: 600 MB down in minutes, imported in under an hour on any box with a real SSD, fits comfortably on a 32 GB volume.
The other thing people underestimate is update bandwidth. The OSM diff infrastructure publishes daily and hourly changesets against the planet. Daily diffs for the whole planet run 60–100 MB per day. For a US-only extract, the equivalent Geofabrik diff is maybe 5–10 MB. Over a month, the planet is chewing through gigabytes of replication data. Fine if you’re running a continent-scale service; absurd if you’re mapping taco deliveries in Travis County.
Honest take: if you’re running Nominatim, Tile-Server, or any other OSM import pipeline on home lab hardware, planet-latest is the file you download when you have too much SSD and too much time. Pick a region.
The Three Sources Worth Knowing
Geofabrik: The Best Default
Geofabrik is a German GIS company that has been publishing OSM regional extracts since basically forever. Their download server at download.geofabrik.de is the go-to for nearly everyone. The extracts are current, updated daily, and the URL structure is predictable and stable — which matters when you’re scripting imports or wiring up replication.
The hierarchy is: continent → country → subregion. For example:
https://download.geofabrik.de/north-america-latest.osm.pbfhttps://download.geofabrik.de/north-america/us-latest.osm.pbfhttps://download.geofabrik.de/north-america/us/texas-latest.osm.pbf
# Pull a country extractwget https://download.geofabrik.de/north-america/us-latest.osm.pbf
# Pull a state extract — much smaller, much sanerwget https://download.geofabrik.de/north-america/us/texas-latest.osm.pbfEach extract page also has an .md5 checksum file and an -updates/ subdirectory for the replication endpoint. That updates URL is what you hand to Nominatim’s REPLICATION_URL so it can pull daily diffs without re-downloading the whole thing.
Geofabrik’s coverage isn’t perfect — some countries have weird political boundaries that make clean region extracts hard, and very small countries often get bundled together (looking at you, Central America). But for North America, Europe, and most of Asia, it’s the right first stop.
BBBike: Custom City-Level Extracts
Geofabrik’s smallest unit is usually a state or province. If you only need one city — a metropolitan area for a local app, a single delivery zone, a city transit project — Geofabrik will still hand you a state-sized file with a lot of countryside you don’t need.
BBBike solves this. At extract.bbbike.org you can draw an arbitrary bounding box on a map and request a custom extract of just that area. The service runs against a recent planet snapshot and emails you when the extract is ready, usually within 10–15 minutes.
The outputs are tiny. A city like Austin, TX comes out around 30 MB. Denver is roughly 40 MB. Tokyo, with its density, is maybe 200 MB. These are numbers that fit in memory and import in minutes.
BBBike extracts don’t have the daily diff infrastructure that Geofabrik does, so they’re best for one-shot imports or cases where you’re re-importing periodically from scratch rather than running incremental updates. For a small city install that you refresh monthly, that’s completely fine.
osmium-extract: DIY Clipping From Existing Files
Sometimes neither Geofabrik nor BBBike draws the line exactly where you need it. You want a custom multi-county region. You want to combine parts of two states. You already downloaded a country extract and need to clip it down without downloading anything else.
osmium-extract is the tool for this. It’s part of the osmium-tool package, available on most Linux distros.
# Install on Debian/Ubuntusudo apt install osmium-tool
# Check it worksosmium --versionYou define your clip boundary as a GeoJSON polygon. Create a file called texas-clip.geojson:
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "properties": { "extract": "my-region" }, "geometry": { "type": "Polygon", "coordinates": [[ [-97.9, 30.1], [-97.4, 30.1], [-97.4, 30.6], [-97.9, 30.6], [-97.9, 30.1] ]] } } ]}Then clip:
osmium extract \ --polygon texas-clip.geojson \ --output austin-area.osm.pbf \ texas-latest.osm.pbfThat produces a fresh .osm.pbf clipped to exactly your polygon. Import that. The source file can stay as-is for future re-clipping.
osmium-extract is also how you merge or split extracts. Want all of New England in one file without downloading the full North America extract? Pull each state from Geofabrik, merge them:
osmium merge \ maine-latest.osm.pbf \ new-hampshire-latest.osm.pbf \ vermont-latest.osm.pbf \ --output new-england.osm.pbfIt’s fast. A 500 MB merge or clip typically finishes in under 30 seconds on any modern CPU. The outputs are valid PBF that any OSM tool understands. No proprietary format, no gotchas.
A Quick Word on File Formats
You’ll encounter two compressed formats in the wild: .osm.pbf and .osm.bz2.
.osm.pbf is the binary protocol-buffer format. It’s what every modern OSM tool expects. Smaller on disk, faster to parse, random access friendly. Use this.
.osm.bz2 is bzip2-compressed OSM XML. It’s the old way. Much larger than PBF for equivalent data. Parsing it requires decompressing the whole thing sequentially because it’s just XML in a tarball. Import tools support it but quietly weep. If you see a .bz2 in the wild, look for a .pbf equivalent — one almost certainly exists.
The gist: always grab .osm.pbf. There’s no situation where .osm.bz2 is the better choice today unless you’re debugging something from 2011.
Update Cadence: Staying Fresh Without Re-Importing
One of the best things about Geofabrik is the replication infrastructure. Every extract has a corresponding updates endpoint. The Nominatim Docker image uses this to pull incremental diffs and apply them instead of re-importing from scratch.
Daily diffs cover roughly 24 hours of OSM edits for your region. For most self-hosted use cases, daily is plenty — OSM data in stable urban areas doesn’t change dramatically day to day. New buildings appear, addresses get corrected, roads occasionally get added. Daily diffs capture all of that.
Hourly diffs exist too. If you run a production-grade geocoder where data freshness is genuinely a business requirement, hourly makes sense. For a home lab or internal tool: daily. The diff apply is usually a few seconds, so the interval matters less than you’d think.
BBBike extracts don’t have a diff endpoint. If you’re using BBBike, you re-download the full city extract when you want to refresh. Given the file sizes involved — tens of MB — that’s not painful. Just schedule a cron, download the new extract, re-import. For anything Geofabrik covers, prefer Geofabrik for the diff support.
osmium-clipped files have whatever freshness the source extract had. If you clip from a Geofabrik extract, your clip is current as of the last Geofabrik update. To refresh, pull a fresh source extract and re-clip. Automatable, honest, no magic.
When Planet Actually Makes Sense
In the interest of fairness: there are legitimate reasons to use the planet file.
You need it if you’re building a worldwide geocoder — something like a global address search that has to work for users in any country. You need it for worldwide routing (think: a self-hosted OpenRouteService serving international trips). You need it for GIS analysis at global scale — research projects, population-level spatial analytics, anything that genuinely crosses arbitrary borders.
You might also want it if you’re running a download mirror or a service that redistributes extracts, because you need the canonical source to clip everything else from.
That’s it. For home labs, small businesses, single-country services, and city-scale apps, the planet file is the wrong tool. Pick your region, import it, move on.
The Short Version
If you take nothing else away from this:
- Start with Geofabrik — predictable URLs, daily updates, every level from continent to state.
- Use BBBike for city-level extracts — when you need just a metro area and Geofabrik’s smallest unit is too large.
- Use osmium-extract to clip — when you already have a source file and need a custom boundary, or want to merge multiple regions.
- Always use
.osm.pbf— not.osm.bz2. - Never wget planet-latest — unless you actually need worldwide coverage and have the hardware to prove it.
The OSM ecosystem rewards people who use the right-sized tool. The regional extract approach isn’t a compromise — it’s just correct. Your SSD, your import time, and your future self will thank you.
Related posts
- Nominatim: Self-Hosted Geocoding — set up your own geocoder with a regional extract
- Nominatim vs Photon vs Pelias — choosing the right OSM geocoder
- Nominatim Hardware Sizing — RAM, disk, and CPU requirements by extract size
- PostGIS for Self-Hosted Mapping — spatial database fundamentals