In this article I dig into the question of whether you should use a traditional RDBMS (Postgres, MySQL etc.) or a graph database (Neo4J, AWS Neptune, OrientDB etc.) as you embark on your next data management and knowledge representation project, especially in this era of Knowledge Graphs and LLMs.

Knowing the strengths and weaknesses of each will help you pick the right tool for the job.

Image from: https://www.biconnector.com/blog/knowledge-graph-vs-relational-database-differences/

Relational database tables

Relational databases represent knowledge through structured tables with well-defined relationships, following a strict schema where data is normalized according to formal rules. This approach, grounded in set theory and first-order logic, excels at representing structured information where relationships between entities are fixed and well-understood. The relational model enforces data integrity through constraints, foreign keys, and ACID properties, making it highly reliable for transactional systems. However, it becomes unwieldy when dealing with highly interconnected data or variable-length relationship chains, as these require multiple joins that can impact both query complexity and performance. The model’s strength lies in its mathematical foundations and ability to efficiently handle complex queries involving aggregations and structured data analysis, but its rigid schema requirements and difficulty in representing hierarchical or network-like relationships can make it less suitable for domains where relationship patterns are as important as the data itself.

Structured relationships through foreign keys and joins
Strong schema enforcement and data consistency
Excellent for aggregations, grouping, and statistical analysis
Best for data with regular, predictable structure
Complex relationship traversals require multiple joins, which can become expensive

Relational databases excel at:

Aggregating large amounts of structured data
Complex numerical computations
Enforcing data integrity constraints
Ad-hoc queries on well-defined schemas
ACID transactions

“Just because relational database management systems contain the word relational does not mean that they are good at handling relationships. The word relational in RDBMS stems from relational algebra and not from relationship” https://sonra.io/benefits-graph-databases-data-warehousing/

Property Graphs

Property graphs represent knowledge as a network of nodes (entities) and edges (relationships), where both can carry arbitrary key-value properties, making them naturally suited for domains with complex, interconnected data structures. This model, rooted in graph theory and discrete mathematics, excels at representing and traversing relationships of varying lengths and patterns without the need for expensive join operations. Unlike the rigid schema of relational databases, property graphs offer schema flexibility and intuitive modeling of real-world relationships, making them particularly effective for scenarios where relationship patterns, network effects, and transitive connections are central to the domain knowledge. While they may sacrifice some of the mathematical rigor and consistency guarantees of relational databases, they gain significant advantages in traversing complex relationship networks, finding patterns, and handling data where the connections between entities are as important as the entities themselves, though this flexibility can come at the cost of maintaining data consistency and performing bulk operations.

Flexible schema where nodes and edges can have arbitrary properties
Natural representation of many-to-many relationships
Efficient for traversing connected data
Strong for pattern matching across relationships
Can represent hierarchical and network structures more intuitively

Let’s look at the types of queries where graphs particularly excel:

Path Finding and Distance

Finding shortest paths between entities
Detecting cycles in relationships
Computing degrees of separation
Example: “Find all people connected to John through at most 3 friendship relationships”
In SQL this would require multiple self-joins with uncertain depth

Pattern Matching

Finding subgraphs matching specific patterns
Detecting common relationship structures
Example: “Find all cases where A influences B who influences C who influences A back”
SQL would need complex recursive CTEs or multiple joins

Recommendation Engines

Finding similar nodes based on relationship patterns
Identifying clusters of related entities
Example: “Find products frequently bought together by customers who bought X”
Possible but more complex in SQL due to multiple relationship hops

Hierarchical Data

Traversing tree structures of unknown depth
Finding common ancestors
Example: “Find all departments under a given manager, at any level”
Requires recursive CTEs in SQL which can be less performant

Theoretical Foundations

For an academic perspective:

Relational model has stronger theoretical foundations
Graph model has more expressive power for certain queries
Graph database theory is rapidly evolving
Hybrid approaches are gaining theoretical attention

Performance

Performance characteristics differ significantly between SQL and graph databases, and the relative performance depends heavily on several factors:

Query Patterns:

SQL databases excel at large-scale aggregations, joins on indexed columns, and batch processing
Graph databases excel at traversing relationships, especially when the traversal depth is variable or unknown
The performance gap becomes more pronounced as the number of joins/hops increases

Published benchmarks suggest:

For 2-hop queries: SQL and graph performance might be comparable
For 3+ hop queries: Graph databases can be 10-100x faster
For variable depth traversals: Graph databases can be 100-1000x faster

SQL Databases:

Better for bulk loading and updates
Better for full table scans and aggregations
Index maintenance becomes crucial
Join performance degrades with scale
Memory requirements grow with join complexity

Graph Databases:

Better for traversing connected data
Index-free adjacency makes relationship traversal fast
Memory requirements grow with relationship density
Loading/update performance can be slower
Some graph databases keep relationships in memory

Summary

My general advice is that your default option should be RDBMS, due to its price/performance, support for transactions and scale. You move away from RDBMS if:

Your schema is very dynamic, or likely to evolve in unpredictable ways and you don’t want/need a centralised schema management team
The relationships between entities is at least as important as the entities themselves
Your domain model is naturally a graph and your subject matter experts think in terms of graphs
Your queries are frequently about the relationships between entities, particularly if the relationships could become arbitrarily complex (loops, chains, trees, taxonomies, friend-of-a-friend, clusters…)

If your data is large (billions of rows/nodes), in both cases you should think about data partitioning and scale-out across many servers.

In some cases a hybrid architecture may be warranted: using RDBMS for online transaction processing and a graph database for offline graph analytics.

Innovation That Matters

Technology trends for the world of tomorrow

Knowledge Representation: Graphs or Tables?

Relational database tables

Property Graphs

Path Finding and Distance

Pattern Matching

Recommendation Engines

Hierarchical Data

Theoretical Foundations

Performance

Summary

danielselman

Leave a comment Cancel reply

Search

Innovation That Matters

Technology trends for the world of tomorrow

Knowledge Representation: Graphs or Tables?

Relational database tables

Property Graphs

Path Finding and Distance

Pattern Matching

Recommendation Engines

Hierarchical Data

Theoretical Foundations

Performance

Summary

Share this:

danielselman

Leave a comment Cancel reply

Search