Neo4j vs ArangoDB: Graph DB Showdown

Take a breath. Close those tabs. If you’re trying to model friends, fraud rings, or the terrifyingly efficient path from “Add to cart” to “chargeback,” a graph database often saves you from relational-query hell. This isn’t about making your SQL prettier, it’s about avoiding the Cartesian-product wielding chains of JOINs that show up when relationships matter more than rows.

When you actually need a graph DB (relational gets ugly fast)

Relational databases are great at tables, aggregates, and ACID guarantees. They’re terrible at questions like:

“Who are Alice’s 3-hop friends who like sci‑fi and work within 50km of me?”
“Show me the path between Order#123 and a suspicious account, crossing users, devices and IPs.”
“Find communities or cycles in the social graph.”

If your domain is connected data, social, recommendations, knowledge graphs, fraud detection, dependency analysis, the object you care about is a traversal. Relational solutions (recursive CTEs, adjacency tables) work for small graphs, but they become complex and slow when depth or branching factor grows. That’s the moment a graph DB stops being academic and becomes practical.

That said: don’t reach for a graph DB because somebody said “graphs are cool.” If your joins are shallow, or your app is mostly CRUD with occasional reporting, stick with Postgres.

Neo4j install (Docker Compose example)

Neo4j is the iconic, purpose-built graph DB. Nice visual tooling, mature ecosystem, and Cypher, a query language that reads like “graph SQL.” Quick local spin-up with Docker Compose:

version: "3.8"
services:
  neo4j:
    image: neo4j:5
    container_name: neo4j
    ports:
      - "7474:7474"   # HTTP Browser
      - "7687:7687"   # Bolt protocol
    environment:
      NEO4J_AUTH: "neo4j/secret"
    volumes:
      - ./neo4j/data:/data
      - ./neo4j/import:/var/lib/neo4j/import

Start it:

docker compose -f docker-compose.neo4j.yml up -d
# visit http://localhost:7474, login neo4j / secret

Takeaway: Community edition is simple for single‑node local work.

Basic Cypher example (MATCH, CREATE, simple path traversal)

Cypher is expressive for creating and traversing patterns. Example: build a tiny social graph, then find 1 to 3 hop friends.

// create sample nodes and relationships
CREATE (alice:Person {name: 'Alice'}),
       (bob:Person {name: 'Bob'}),
       (carol:Person {name: 'Carol'}),
       (dave:Person {name: 'Dave'}),
       (alice)-[:KNOWS {since:2020}]->(bob),
       (bob)-[:KNOWS {since:2019}]->(carol),
       (carol)-[:KNOWS {since:2018}]->(dave);

// find friends up to 3 hops away from Alice
MATCH path=(a:Person {name:'Alice'})-[:KNOWS*1..3]->(friend)
RETURN [n IN nodes(path) | n.name] AS chain, length(path) AS hops
ORDER BY hops
LIMIT 10;

And a tiny Node.js client snippet:

import neo4j from 'neo4j-driver';
const driver = neo4j.driver('bolt://localhost:7687', neo4j.auth.basic('neo4j','secret'));
const session = driver.session();

const q = `MATCH (a:Person {name:$name})-[:KNOWS*1..3]->(f) RETURN DISTINCT f.name AS friend LIMIT 25`;
const res = await session.run(q, {name: 'Alice'});
console.log(res.records.map(r => r.get('friend')));
await session.close();
await driver.close();

ArangoDB install (Docker Compose)

ArangoDB is multi‑model: documents, key-value, and graphs in the same engine. Spin up a single node quickly:

version: "3.8"
services:
  arangodb:
    image: arangodb:latest
    container_name: arangodb
    ports:
      - "8529:8529"
    environment:
      ARANGO_ROOT_PASSWORD: "secret"
    volumes:
      - ./arangodb/data:/var/lib/arangodb3

docker compose -f docker-compose.arangodb.yml up -d
# visit http://localhost:8529, login root / secret

ArangoDB also supports clustering without forking over to an enterprise license, more on that later.

Same query in AQL

ArangoDB stores graph edges in edge collections and vertices in document collections. You can use the General Graph API or raw AQL traversals. Example creates and queries roughly the same social graph:

// AQL has no DDL — create the collections first via arangosh or the web UI:
//   db._create('people')
//   db._createEdgeCollection('knows')
// then run the AQL below.

// insert vertices
INSERT { _key: 'alice', name: 'Alice' } INTO people
INSERT { _key: 'bob', name: 'Bob' } INTO people
INSERT { _key: 'carol', name: 'Carol' } INTO people
INSERT { _key: 'dave', name: 'Dave' } INTO people

// insert edges
INSERT { _from: 'people/alice', _to: 'people/bob', since: 2020 } INTO knows
INSERT { _from: 'people/bob', _to: 'people/carol', since: 2019 } INTO knows
INSERT { _from: 'people/carol', _to: 'people/dave', since: 2018 } INTO knows

// traverse 1..3 OUTBOUND from alice
FOR v, e, p IN 1..3 OUTBOUND 'people/alice' GRAPH 'social'
  RETURN { path: p.vertices[*].name, hops: LENGTH(p.edges) }

A couple of notes: you can also use FOR v IN 1..3 OUTBOUND 'people/alice' knows which references the edge collection directly (no graph object required).

Traversal performance reality

Talking speed: the theoretical complexity of a traversal depends on branching factor and depth. If each node points to 10 neighbors and you traverse 4 hops, you’re looking at 10^4 work in the worst case. Indexes only buy you the initial seed lookup, traversals are graph‑engine work.

Neo4j uses a native graph storage and a traversal engine optimized for following pointers. For deep, pointer‑chasing workloads (e.g., recommendation engines, pathfinding, community detection), Neo4j often wins on single‑node latency.

ArangoDB is highly optimized too, but remember it’s multi‑model: edges are logical constructs stored in collections. For many real workloads ArangoDB keeps up, and its horizontal sharding can outperform Neo4j when you need scale-out across many machines.

Practical advice:

Index the properties used to seed traversals (name, id). Both engines rely on an index to find the start node quickly.
Limit traversal depth or prune by labels/types. Blind deep traversals blow memory/CPU fast.
For OLAP-style analytics over the graph, export to a batch engine (Spark, NetworkX): not every graph query should be online.

Multi-model promise (is it complexity bait or actually useful?)

ArangoDB’s multi‑model approach is delightful in the “one tool to rule this dataset” sense. Want documents and graphs tightly coupled with minimal duplication? Nice. Want to mix key‑value lookups, graph traversals and document updates in a single transaction? Also nice.

But there’s a cost: API surface area and cognitive load. You’ll need to learn AQL (it’s SQL-ish but quirky), understand collections vs graphs, and decide when a piece of data is a vertex vs an embedded document. If you don’t actually need multi‑model features, that extra flexibility can be a distraction.

Use cases where multi‑model helps:

Microservices where a document stores the object and a graph links those objects (e.g., product catalog + recommendation edges).
When eliminating cross‑store duplication is worth a slightly steeper learning curve.

If you want a single, obvious graph primitive and a rich graph ecosystem (APOC, graph algorithms), Neo4j’s narrower focus can be a feature, less wiggle room for making poor modeling choices.

Licensing breakdown (Neo4j Community single-instance + GPL, ArangoDB OSS clustering)

Short version: Neo4j’s Community offering is aimed at single‑node use for local/dev; clustering and some enterprise features are behind the commercial/Enterprise line. The Community edition’s license is more restrictive than permissive‑open licenses; check Neo4j’s site for the exact legal wording before commercial deployment.

ArangoDB’s open‑source edition historically gives you more clustering options without an enterprise purchase. That matters if you want to run a fault‑tolerant cluster in a homelab or DIY cloud without paying license fees.

Legal nit: licensing changes over years. Treat this as a signpost: assume Neo4j enterprise features (fabric/causal clustering, advanced ops) are paid; assume ArangoDB community is more permissive for cluster usage, verify current licenses for production.

Scaling reality (Neo4j Enterprise for clustering, ArangoDB free clustering)

Neo4j scales vertically and offers causal clustering & sharding features in Enterprise. It’s battle‑tested, but the clustering stack is an enterprise feature and requires a license if you need production HA and sharding.

ArangoDB prides itself on shipping cluster functionality in the OSS version: coordinated agents, DB servers, and coordinators letting you shard and replicate collections (including graph data). That can be huge for homelabters who want a redundant graph cluster for free, but setup and ops are nontrivial.

Operationally:

Neo4j Enterprise: polished clustering, tooling, and support, expect fewer surprises but a license cost for production.
ArangoDB OSS cluster: powerful and free, but prepare for more hands‑on orchestration and network configuration.

Tooling (Neo4j Browser vs ArangoDB web UI)

Neo4j: excellent visual tooling. The classic Neo4j Browser is great for ad‑hoc exploration; Neo4j Bloom gives polished visual discovery (commercial). Drivers and the APOC library add a ton of power.

ArangoDB: a solid web UI with query editor, collection management, and graph visualization. Less “batteries‑included” for graph analysis than Neo4j’s ecosystem, but it’s more than usable and integrates nicely with Foxx microservices.

Both have language drivers for Node, Python, Java, etc. If visual, one‑click exploration matters to you, Neo4j’s UX is slightly friendlier out of the box.

Reality check: Postgres + recursive CTEs or AGE extension first

Before dropping to a graph DB, ask whether Postgres can do the job. For many hierarchical or shallow graph problems, recursive CTEs are fine and keep operational complexity low.

Example (Postgres recursive CTE):

WITH RECURSIVE path AS (
  SELECT id, name, ARRAY[id] AS chain
  FROM nodes
  WHERE id = 1
  UNION ALL
  SELECT e.to_id, n.name, chain || e.to_id
  FROM edges e
  JOIN nodes n ON n.id = e.to_id
  JOIN path p ON e.from_id = p.id
  WHERE NOT e.to_id = ANY(chain)
)
SELECT * FROM path;

If you’re already Postgres-heavy, try a prototype there. If queries become awkward or slow as depth/branching grows, then consider a graph DB. Also look at the AGE extension for Postgres it adds graph semantics and Cypher-like queries inside Postgres if you want a hybrid route.

Decision matrix (use cases per DB)

Use Neo4j when:
You need a dedicated, polished graph engine and the rich ecosystem (APOC, built-in graph algorithms).
You prefer Cypher’s expressive pattern matching.
Your workload is deep pointer-chasing on a single node or you have Enterprise budget for clustering.
Use ArangoDB when:
You need documents + graphs in one place and want to avoid cross-store duplication.
You want free clustering for a production-ish homelab without enterprise spend.
You’re comfortable with AQL and a slightly broader mental model.
Use Postgres (CTEs/AGE) when:
Your graph needs are modest and you prefer operational simplicity.
You want to keep everything in a single mature RDBMS with straightforward backups and tooling.

SumGuy-voice conclusion (winner per use case)

Neo4j is the precision forklift: purpose‑built, tidy, and it’ll handle delicate graph hauling like a pro, but the heavy‑duty clamps (clustering, commercial tools) cost money. ArangoDB is the Swiss Army hexcrystal: multi‑model, flexible, and lets you run a real cluster without needing a corporate purchase order. It’s slightly messier, but it’s free and powerful.

So: if you want best‑in‑class graph ergonomics and you’re building a graph‑first product, Neo4j is the friendly winner. If you want a single datastore that covers documents, KV and graph, or you want to scale cheaply in a homelab, ArangoDB is probably the more practical pick.

Your 2 AM self will appreciate picking the right tool: use a graph DB when the problem is connectedness, not because the marketing team liked the logo. And if you need a last‑minute escape hatch, try Postgres + CTEs or AGE before committing to a new database.

Happy scheming; don’t hire a forklift to move a couch unless you like explaining yourself to the neighbors.

Neo4j vs ArangoDB: Graph DB Showdown

When you actually need a graph DB (relational gets ugly fast)

Neo4j install (Docker Compose example)

Basic Cypher example (MATCH, CREATE, simple path traversal)

ArangoDB install (Docker Compose)

Same query in AQL

Traversal performance reality

Multi-model promise (is it complexity bait or actually useful?)

Licensing breakdown (Neo4j Community single-instance + GPL, ArangoDB OSS clustering)

Scaling reality (Neo4j Enterprise for clustering, ArangoDB free clustering)

Tooling (Neo4j Browser vs ArangoDB web UI)

Reality check: Postgres + recursive CTEs or AGE extension first

Decision matrix (use cases per DB)

SumGuy-voice conclusion (winner per use case)

Responses from around the web

Discussion

Related Posts

ClickHouse vs DuckDB vs StarRocks: Light OLAP

Adding NOT NULL on a Big Table Without Downtime

Postgres HA: Patroni + etcd + HAProxy

Dragonfly vs Redis: Single-Binary Performance

Neo4j vs ArangoDB: Graph DB Showdown

When you actually need a graph DB (relational gets ugly fast)

Neo4j install (Docker Compose example)

Basic Cypher example (MATCH, CREATE, simple path traversal)

ArangoDB install (Docker Compose)

Same query in AQL

Traversal performance reality

Multi-model promise (is it complexity bait or actually useful?)

Licensing breakdown (Neo4j Community single-instance + GPL, ArangoDB OSS clustering)

Scaling reality (Neo4j Enterprise for clustering, ArangoDB free clustering)

Tooling (Neo4j Browser vs ArangoDB web UI)

Reality check: Postgres + recursive CTEs or AGE extension first

Decision matrix (use cases per DB)

SumGuy-voice conclusion (winner per use case)

Related Reading

Responses from around the web

Discussion

Related Posts

ClickHouse vs DuckDB vs StarRocks: Light OLAP

Adding NOT NULL on a Big Table Without Downtime

Postgres HA: Patroni + etcd + HAProxy

Dragonfly vs Redis: Single-Binary Performance