Every Drive’s Marketing Spec Is a Lie
That 100,000 IOPS number on your SSD’s datasheet? That’s a lab fairy tale. Sequential reads under perfect conditions, zero network jitter, zero other workloads, probably on a cold disk. Your database doesn’t work that way. Your storage cluster doesn’t work that way. Real workloads are messier: random access patterns, queue depths all over the place, latency that spikes at 3 AM.
Here’s the thing: you need actual numbers from your storage under your workload. That’s where fio comes in. It’s a disk I/O benchmark tool that lets you define exactly how your application hammers the disk, then gives you honest performance data: throughput, IOPS, latency percentiles. No marketing. No lab conditions. Just brutal truth.
If you’re choosing storage for a homelab, running a self-hosted database, or just tired of guessing whether your NVMe is actually faster than that SATA drive, fio is your answer.
Why fio Matters More Than Vendor Specs
Marketing teams measure IOPS with a single sequential operation at iodepth 1 (one request at a time). That’s great if your workload is “read this file from start to finish.” But real databases? They queue up multiple requests. They random-access. They mix reads and writes. A disk that claims 100K IOPS might only do 5K under your actual pattern.
fio lets you replicate your workload exactly:
- Sequential reads/writes at 1 MB per second vs 100 MB per second
- Random 4K operations (what databases do)
- Mixed read/write patterns (70% reads, 30% writes — sounds familiar?)
- Latency profiles at the 99th, 99.9th, 99.99th percentile (because your users care about the slowest requests, not the average)
You can test:
- Raw NVMe performance
- SATA SSDs in real conditions
- Spinning rust (HDDs)
- RAID arrays
- ZFS pools
- Network storage (NFS, iSCSI, Ceph)
And you’ll get reproducible numbers you can trust.
Installing fio
On most distros, it’s in the package manager:
# Debian/Ubuntusudo apt install fio
# RHEL/CentOS/Fedorasudo dnf install fio
# Alpineapk add fio
# macOS (via Homebrew)brew install fioVerify the install:
fio --versionThat’s it. No complicated dependencies.
The Core Concepts
Before you run benchmarks, understand what you’re configuring:
Jobs: One benchmark test. A job defines what the disk does (read, write, random, sequential) and how (block size, queue depth, number of threads).
ioengine: How fio talks to the disk. Most common are:
libaio: Linux native async I/O. Good, stable, fast.io_uring: Newer Linux async I/O (5.1+). Even faster.sync: Synchronous reads/writes. Slow, but accurate for single-threaded workloads.psync: Pthreads with sync I/O. One thread per job.
iodepth: How many operations fio queues up at once. iodepth=1 = one at a time (like vendor specs, unrealistic for real workloads). iodepth=32 or 64 = queue up 32–64 ops (realistic for databases).
numjobs: Number of threads/processes. One job with numjobs=4 = four threads running the same test in parallel.
rw: Read/write pattern.
read: Sequential readwrite: Sequential writerandread: Random readrandwrite: Random writerw: Sequential mix (50/50)randrw: Random mix (default 50/50, configurable withrwmixread)
bs: Block size. How much data per operation. 4K is tiny (databases). 1M is huge (sequential throughput). Can specify per-operation: bsrange=4k-512k.
size: Total amount of data to test. fio creates a file or uses raw device space. Bigger = more realistic (avoids cache artifacts).
runtime: How long the test runs. Often you set this instead of size, to stress the disk over time.
direct: Use O_DIRECT (bypass filesystem cache). Almost always 1 for honest storage benchmarks. If you skip this, you’re measuring your RAM cache, not your disk.
Four Core Workload Patterns
1. Sequential Throughput
What it measures: How fast can you read or write a huge file? This is your disk’s best-case scenario (and what vendor specs love to show).
fio --name=seq-read \ --ioengine=libaio \ --iodepth=4 \ --rw=read \ --bs=1m \ --size=1g \ --direct=1 \ --numjobs=1 \ --runtime=30 \ --time_basedYou’ll see throughput in MB/s. This is where an NVMe shines (5000+ MB/s), while an old HDD gasps (100–200 MB/s).
2. Random 4K Performance
What it measures: How many small operations can the disk handle per second? This is what matters for databases, filesystems, and virtual machines.
fio --name=random4k \ --ioengine=libaio \ --iodepth=32 \ --rw=randread \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=4 \ --runtime=60 \ --time_basedLook for IOPS in the thousands. A good NVMe does 100K+ IOPS. A SATA SSD does 20K–50K. An HDD does 100–300 IOPS (ouch).
3. Mixed Workload (Real-World DB)
What it measures: Your database isn’t pure reads or pure writes. It’s often 70% reads and 30% writes. This gets closer to reality.
fio --name=mixed-workload \ --ioengine=libaio \ --iodepth=16 \ --rw=randrw \ --rwmixread=70 \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=4 \ --runtime=60 \ --time_basedIOPS will drop compared to pure reads because writes are slower. But this is honest.
4. Latency Profile
What it measures: How consistent is the disk? A drive with 99.99th percentile latency of 50 ms will ruin your user experience even if average latency is 5 ms.
fio --name=latency-test \ --ioengine=libaio \ --iodepth=1 \ --rw=randread \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=1 \ --runtime=120 \ --time_based \ --output=latency.json \ --output-format=jsonRun this long (2+ minutes) to get good percentile data. The JSON output includes 99th, 99.9th, and 99.99th percentile latencies.
Reading fio Output (The Good, the Bad, the Ugly)
When fio finishes, you get a wall of text. Here’s what matters:
read: IOPS=25435, BW=99.3MiB/s (104MB/s)(5960MiB/60001msec) slat (nsec): min=1852, max=45603, avg=2892.13, stdev=892.44 clat (nsec): min=1234, max=156789, avg=1256.12, stdev=5234.55 percentile (nsec): 50.00th=[ 1234], 90.00th=[ 1892], 99.00th=[ 2456], 99.90th=[ 3892], 99.99th=[12456] lat (nsec): min=3456, max=158901, avg=4148.25, stdev=6123.67BW (bandwidth): Throughput in MB/s. Compare this across drives. Higher is better.
IOPS: Operations per second. Compare across drives. Higher is better.
slat (submission latency): Time between fio asking the kernel to do I/O and the kernel accepting it. Usually microseconds. Ignore unless it’s huge (microseconds).
clat (completion latency): Time from submission to completion. This is what your application feels. Lower is better.
lat (total latency): slat + clat.
percentiles: The big one. Look at 99.00th, 99.90th, 99.99th. If 99% of requests finish in 2 ms but 99.99% take 12 ms, your tail latency is rough. Some users will experience that slowness.
libaio vs io_uring: The Engine Debate
libaio (Linux native AIO): Stable, well-tested, works everywhere. It’s the default. Use this unless you have a reason not to.
fio --name=test --ioengine=libaio ...io_uring (Linux 5.1+): Newer, faster, more flexible. If your kernel supports it (5.1 or later), try it for 10–15% better performance.
fio --name=test --ioengine=io_uring ...To check if io_uring is available:
cat /proc/sys/kernel/io_uring_disabled 2>/dev/null && echo "io_uring present" || echo "not available"If you’re on an older kernel or unsure, stick with libaio.
Saving Tests as Job Files
Typing long fio commands gets tedious. Save them as job files (INI format) and reuse them:
[global]ioengine=libaiodirect=1group_reporting=1time_based=1runtime=60
[random4k-test]rw=randreadbs=4kiodepth=32numjobs=4size=10gRun it:
fio random4k.fioMuch cleaner. You can commit these to git and version them over time.
Real-World Examples
NVMe Benchmark
You bought a fancy NVMe. Prove it’s worth the money:
fio --name=nvme-seq \ --ioengine=io_uring \ --iodepth=32 \ --rw=read \ --bs=256k \ --size=20g \ --direct=1 \ --numjobs=1 \ --runtime=60 \ --time_based
fio --name=nvme-rand \ --ioengine=io_uring \ --iodepth=32 \ --rw=randread \ --bs=4k \ --size=20g \ --direct=1 \ --numjobs=8 \ --runtime=60 \ --time_basedExpect: sequential 3000–7000 MB/s, random 50K–500K IOPS depending on your drive.
SATA SSD Benchmark
The workhorse drive in most homelabs:
fio --name=sata-seq \ --ioengine=libaio \ --iodepth=4 \ --rw=read \ --bs=256k \ --size=10g \ --direct=1 \ --numjobs=1 \ --runtime=60 \ --time_based
fio --name=sata-rand \ --ioengine=libaio \ --iodepth=16 \ --rw=randread \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=4 \ --runtime=60 \ --time_basedExpect: sequential 400–600 MB/s, random 15K–50K IOPS.
HDD (Spinning Rust)
For bulk storage or archival. Latency will be brutal:
fio --name=hdd-seq \ --ioengine=libaio \ --iodepth=1 \ --rw=read \ --bs=1m \ --size=5g \ --direct=1 \ --numjobs=1 \ --runtime=60 \ --time_based
fio --name=hdd-rand \ --ioengine=libaio \ --iodepth=1 \ --rw=randread \ --bs=4k \ --size=5g \ --direct=1 \ --numjobs=1 \ --runtime=60 \ --time_basedExpect: sequential 100–200 MB/s (if lucky), random 100–300 IOPS. This is why you don’t run databases on HDDs.
ZFS Pool Benchmark
Testing your RAID-1 or RAID-5 pool:
# Test on /mnt/tank (your ZFS pool)fio --name=zfs-test \ --ioengine=libaio \ --iodepth=32 \ --rw=randrw \ --rwmixread=70 \ --bs=4k \ --size=5g \ --directory=/mnt/tank \ --numjobs=4 \ --runtime=60 \ --time_basedNote: no direct=1 here because ZFS manages its own caching. Let it do its thing.
Common Mistakes and How to Avoid Them
Forgetting direct=1: Without it, fio measures your RAM cache, not your disk. Always use --direct=1 unless you’re specifically testing cache performance.
Running on a mounted filesystem with heavy caching: Even with direct=1, if your filesystem is doing COW (copy-on-write) or compression, you’ll get weird results. Test on a dedicated partition or raw device if possible.
Not dropping caches between tests: Linux caches aggressively. Between benchmark runs, clear the cache:
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'Then run your benchmark. Same command, clean slate.
Using iodepth=1 for everything: It’s realistic for synchronous code but misleading for async workloads. Most modern systems queue up 4–32 operations. Test with realistic iodepth.
Not running long enough: fio’s default 30-second runtime is short. Run at least 60 seconds, ideally 120. SSDs warm up, caches settle, you get better data.
Testing on a small dataset: If you test 100 MB on a 1 TB drive, you’re hitting a tiny part of the platters/flash. Use at least 10% of the drive’s capacity. A good rule: size = 10 GB for a 100 GB drive, 100 GB for a 1 TB drive.
Comparing different block sizes without thinking: A random 4K benchmark looks different from a random 64K benchmark. They’re testing different things. Be consistent when comparing drives.
Latency Testing and ramp-time
The first few seconds of a benchmark aren’t representative. Disks warm up, caches settle, queue depth stabilizes. Use ramp_time to throw away the warmup:
fio --name=test \ --ioengine=libaio \ --iodepth=32 \ --rw=randread \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=4 \ --ramp_time=10 \ --runtime=60 \ --time_basedThis runs 10 seconds of warmup (discarded), then 60 seconds of real testing.
For latency testing, also check the 99.99th percentile tail. That’s what your user experiences when everything hits at once:
fio --name=latency \ --ioengine=libaio \ --iodepth=1 \ --rw=randread \ --bs=4k \ --size=10g \ --direct=1 \ --numjobs=1 \ --ramp_time=5 \ --runtime=180 \ --time_based \ --output=results.json \ --output-format=jsonParse the JSON to see the percentile breakdown. A drive with low 99.99th latency is solid; one with spiky tail latency is a problem.
File-Based vs Raw Device Testing
File-based (what we’ve shown above): Create a file on your filesystem and benchmark it. This includes filesystem overhead. Good for real-world results.
fio --name=file-test \ --filename=/mnt/disk/test.img \ ...Raw device: Benchmark the raw disk without filesystem. This is pure storage performance.
sudo fio --name=device-test \ --filename=/dev/nvme0n1 \ --direct=1 \ ...For most of you, file-based is fine. Raw device is useful if you’re diagnosing filesystem issues or comparing raw RAID performance.
How to Compare Disks Fairly
Use the same test on all drives:
- Same ioengine (libaio or io_uring)
- Same iodepth (usually 32)
- Same block size (usually 4k or 256k depending on workload)
- Same runtime (at least 60 seconds)
- Same ramp_time (10 seconds)
- Drop caches between runs
- Compare IOPS or throughput
Create a job file and use it on each drive:
[global]ioengine=libaiodirect=1ramp_time=10runtime=60time_based=1
[4k-random-reads]rw=randreadbs=4kiodepth=32numjobs=4size=10gRun on drive A, note the IOPS. Clear caches. Run on drive B. Compare.
Network Storage (NFS, iSCSI, Ceph)
Same tool works on network storage:
# NFS mount at /mnt/nfsfio --name=nfs-test \ --ioengine=libaio \ --iodepth=16 \ --rw=randrw \ --rwmixread=70 \ --bs=4k \ --size=5g \ --directory=/mnt/nfs \ --numjobs=4 \ --runtime=60 \ --time_basedLatency will be higher (network overhead), but you’ll see how your storage cluster actually performs. This is valuable for homelab setups with Ceph or iSCSI.
Your fio Cheat Sheet
Save these. Commit them to a git repo. Use them whenever you’re evaluating storage.
Quick sequential throughput:
fio --name=seq --ioengine=libaio --iodepth=4 --rw=read --bs=1m --size=10g --direct=1 --runtime=60 --time_basedQuick random 4K IOPS:
fio --name=rand4k --ioengine=libaio --iodepth=32 --rw=randread --bs=4k --size=10g --direct=1 --numjobs=4 --runtime=60 --time_basedDatabase-like mixed workload:
fio --name=mixed --ioengine=libaio --iodepth=16 --rw=randrw --rwmixread=70 --bs=4k --size=10g --direct=1 --numjobs=4 --runtime=60 --time_basedLatency tail (99.99th percentile):
fio --name=lat --ioengine=libaio --iodepth=1 --rw=randread --bs=4k --size=10g --direct=1 --ramp_time=5 --runtime=180 --time_based --output=lat.json --output-format=jsonThe Honest Conclusion
You now have a tool that tells the truth about your storage. No marketing BS. No lab conditions. Just real numbers from real workloads.
The next time a vendor tells you their drive does “100,000 IOPS,” you can smile and run fio on your actual workload. Maybe it does 100K. Maybe it does 5K. Now you know.
Your 2 AM self will thank you when a storage bottleneck doesn’t crater your database at 3 AM.