Skip to content
Go back

Neo4j vs ArangoDB: Graph DB Showdown

By SumGuy 10 min read
Neo4j vs ArangoDB: Graph DB Showdown

Take a breath. Close those tabs. If you’re trying to model friends, fraud rings, or the terrifyingly efficient path from “Add to cart” to “chargeback,” a graph database often saves you from relational-query hell. This isn’t about making your SQL prettier — it’s about avoiding the Cartesian-product wielding chains of JOINs that show up when relationships matter more than rows.

When you actually need a graph DB (relational gets ugly fast)

Relational databases are great at tables, aggregates, and ACID guarantees. They’re terrible at questions like:

If your domain is connected data — social, recommendations, knowledge graphs, fraud detection, dependency analysis — the object you care about is a traversal. Relational solutions (recursive CTEs, adjacency tables) work for small graphs, but they become complex and slow when depth or branching factor grows. That’s the moment a graph DB stops being academic and becomes practical.

That said: don’t reach for a graph DB because somebody said “graphs are cool.” If your joins are shallow, or your app is mostly CRUD with occasional reporting, stick with Postgres.

Neo4j install (Docker Compose example)

Neo4j is the iconic, purpose-built graph DB. Nice visual tooling, mature ecosystem, and Cypher — a query language that reads like “graph SQL.” Quick local spin-up with Docker Compose:

docker-compose.neo4j.yml
version: "3.8"
services:
neo4j:
image: neo4j:5
container_name: neo4j
ports:
- "7474:7474" # HTTP Browser
- "7687:7687" # Bolt protocol
environment:
NEO4J_AUTH: "neo4j/secret"
volumes:
- ./neo4j/data:/data
- ./neo4j/import:/var/lib/neo4j/import

Start it:

start-neo4j.sh
docker compose -f docker-compose.neo4j.yml up -d
# visit http://localhost:7474, login neo4j / secret

Takeaway: Community edition is extremely simple for single‑node local work.

Basic Cypher example (MATCH, CREATE, simple path traversal)

Cypher is expressive for creating and traversing patterns. Example: build a tiny social graph, then find 1–3 hop friends.

basic.cypher
// create sample nodes and relationships
CREATE (alice:Person {name: 'Alice'}),
(bob:Person {name: 'Bob'}),
(carol:Person {name: 'Carol'}),
(dave:Person {name: 'Dave'}),
(alice)-[:KNOWS {since:2020}]->(bob),
(bob)-[:KNOWS {since:2019}]->(carol),
(carol)-[:KNOWS {since:2018}]->(dave);
// find friends up to 3 hops away from Alice
MATCH path=(a:Person {name:'Alice'})-[:KNOWS*1..3]->(friend)
RETURN [n IN nodes(path) | n.name] AS chain, length(path) AS hops
ORDER BY hops
LIMIT 10;

And a tiny Node.js client snippet:

neo4j-example.js
import neo4j from 'neo4j-driver';
const driver = neo4j.driver('bolt://localhost:7687', neo4j.auth.basic('neo4j','secret'));
const session = driver.session();
const q = `MATCH (a:Person {name:$name})-[:KNOWS*1..3]->(f) RETURN DISTINCT f.name AS friend LIMIT 25`;
const res = await session.run(q, {name: 'Alice'});
console.log(res.records.map(r => r.get('friend')));
await session.close();
await driver.close();

ArangoDB install (Docker Compose)

ArangoDB is multi‑model: documents, key-value, and graphs in the same engine. Spin up a single node quickly:

docker-compose.arangodb.yml
version: "3.8"
services:
arangodb:
image: arangodb:latest
container_name: arangodb
ports:
- "8529:8529"
environment:
ARANGO_ROOT_PASSWORD: "secret"
volumes:
- ./arangodb/data:/var/lib/arangodb3
start-arangodb.sh
docker compose -f docker-compose.arangodb.yml up -d
# visit http://localhost:8529, login root / secret

ArangoDB also supports clustering without forking over to an enterprise license — more on that later.

Same query in AQL

ArangoDB stores graph edges in edge collections and vertices in document collections. You can use the General Graph API or raw AQL traversals. Example creates and queries roughly the same social graph:

basic.aql
// create collections (run once)
CREATE COLLECTION people
CREATE EDGE COLLECTION knows
// insert vertices
INSERT { _key: 'alice', name: 'Alice' } INTO people
INSERT { _key: 'bob', name: 'Bob' } INTO people
INSERT { _key: 'carol', name: 'Carol' } INTO people
INSERT { _key: 'dave', name: 'Dave' } INTO people
// insert edges
INSERT { _from: 'people/alice', _to: 'people/bob', since: 2020 } INTO knows
INSERT { _from: 'people/bob', _to: 'people/carol', since: 2019 } INTO knows
INSERT { _from: 'people/carol', _to: 'people/dave', since: 2018 } INTO knows
// traverse 1..3 OUTBOUND from alice
FOR v, e, p IN 1..3 OUTBOUND 'people/alice' GRAPH 'social'
RETURN { path: p.vertices[*].name, hops: LENGTH(p.edges) }

A couple of notes: you can also use FOR v IN 1..3 OUTBOUND 'people/alice' knows which references the edge collection directly (no graph object required).

Traversal performance reality

Talking speed: the theoretical complexity of a traversal depends on branching factor and depth. If each node points to 10 neighbors and you traverse 4 hops, you’re looking at 10^4 work in the worst case. Indexes only buy you the initial seed lookup — traversals are graph‑engine work.

Neo4j uses a native graph storage and a traversal engine optimized for following pointers. For deep, pointer‑chasing workloads (e.g., recommendation engines, pathfinding, community detection), Neo4j often wins on single‑node latency.

ArangoDB is highly optimized too, but remember it’s multi‑model: edges are logical constructs stored in collections. For many real workloads ArangoDB keeps up, and its horizontal sharding can outperform Neo4j when you need scale-out across many machines.

Practical advice:

Multi-model promise (is it complexity bait or actually useful?)

ArangoDB’s multi‑model approach is delightful in the “one tool to rule this dataset” sense. Want documents and graphs tightly coupled with minimal duplication? Nice. Want to mix key‑value lookups, graph traversals and document updates in a single transaction? Also nice.

But there’s a cost: API surface area and cognitive load. You’ll need to learn AQL (it’s SQL-ish but quirky), understand collections vs graphs, and decide when a piece of data is a vertex vs an embedded document. If you don’t actually need multi‑model features, that extra flexibility can be a distraction.

Use cases where multi‑model helps:

If you want a single, obvious graph primitive and a rich graph ecosystem (APOC, graph algorithms), Neo4j’s narrower focus can be a feature — less wiggle room for making poor modeling choices.

Licensing breakdown (Neo4j Community single-instance + GPL, ArangoDB OSS clustering)

Short version: Neo4j’s Community offering is aimed at single‑node use for local/dev; clustering and some enterprise features are behind the commercial/Enterprise line. The Community edition’s license is more restrictive than permissive‑open licenses; check Neo4j’s site for the exact legal wording before commercial deployment.

ArangoDB’s open‑source edition historically gives you more clustering options without an enterprise purchase. That matters if you want to run a fault‑tolerant cluster in a homelab or DIY cloud without paying license fees.

Legal nit: licensing changes over years. Treat this as a signpost: assume Neo4j enterprise features (fabric/causal clustering, advanced ops) are paid; assume ArangoDB community is more permissive for cluster usage — verify current licenses for production.

Scaling reality (Neo4j Enterprise for clustering, ArangoDB free clustering)

Neo4j scales vertically and offers causal clustering & sharding features in Enterprise. It’s battle‑tested, but the clustering stack is an enterprise feature and requires a license if you need production HA and sharding.

ArangoDB prides itself on shipping cluster functionality in the OSS version: coordinated agents, DB servers, and coordinators letting you shard and replicate collections (including graph data). That can be huge for homelabters who want a redundant graph cluster for free — but setup and ops are nontrivial.

Operationally:

Tooling (Neo4j Browser vs ArangoDB web UI)

Neo4j: excellent visual tooling. The classic Neo4j Browser is great for ad‑hoc exploration; Neo4j Bloom gives polished visual discovery (commercial). Drivers and the APOC library add a ton of power.

ArangoDB: a solid web UI with query editor, collection management, and graph visualization. Less “batteries‑included” for graph analysis than Neo4j’s ecosystem, but it’s more than usable and integrates nicely with Foxx microservices.

Both have language drivers for Node, Python, Java, etc. If visual, one‑click exploration matters to you, Neo4j’s UX is slightly friendlier out of the box.

Reality check: Postgres + recursive CTEs or AGE extension first

Before dropping to a graph DB, ask whether Postgres can do the job. For many hierarchical or shallow graph problems, recursive CTEs are fine and keep operational complexity low.

Example (Postgres recursive CTE):

recursive_cte.sql
WITH RECURSIVE path AS (
SELECT id, name, ARRAY[id] AS chain
FROM nodes
WHERE id = 1
UNION ALL
SELECT e.to_id, n.name, chain || e.to_id
FROM edges e
JOIN nodes n ON n.id = e.to_id
JOIN path p ON e.from_id = p.id
WHERE NOT e.to_id = ANY(chain)
)
SELECT * FROM path;

If you’re already Postgres-heavy, try a prototype there. If queries become awkward or slow as depth/branching grows, then consider a graph DB. Also look at the AGE extension for Postgres— it adds graph semantics and Cypher-like queries inside Postgres if you want a hybrid route.

Decision matrix (use cases per DB)

SumGuy-voice conclusion (winner per use case)

Neo4j is the precision forklift: purpose‑built, tidy, and it’ll handle delicate graph hauling like a pro — but the heavy‑duty clamps (clustering, commercial tools) cost money. ArangoDB is the Swiss Army hexcrystal: multi‑model, flexible, and lets you run a real cluster without needing a corporate purchase order. It’s slightly messier, but it’s free and powerful.

So: if you want best‑in‑class graph ergonomics and you’re building a graph‑first product, Neo4j is the friendly winner. If you want a single datastore that covers documents, KV and graph, or you want to scale cheaply in a homelab, ArangoDB is probably the more practical pick.

Your 2 AM self will appreciate picking the right tool: use a graph DB when the problem is connectedness, not because the marketing team liked the logo. And if you need a last‑minute escape hatch — try Postgres + CTEs or AGE before committing to a new database.

Happy scheming; don’t hire a forklift to move a couch unless you like explaining yourself to the neighbors.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Zeek for Home Lab Forensics
Next Post
SAS vs SATA in 2026: When SAS Still Wins

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts