In this article I dig into the question of whether you should use a traditional RDBMS (Postgres, MySQL etc.) or a graph database (Neo4J, AWS Neptune, OrientDB etc.) as you embark on your next data management and knowledge representation project, especially in this era of Knowledge Graphs and LLMs.

Knowing the strengths and weaknesses of each will help you pick the right tool for the job.

Image from: https://www.biconnector.com/blog/knowledge-graph-vs-relational-database-differences/

Relational database tables

Relational databases represent knowledge through structured tables with well-defined relationships, following a strict schema where data is normalized according to formal rules. This approach, grounded in set theory and first-order logic, excels at representing structured information where relationships between entities are fixed and well-understood. The relational model enforces data integrity through constraints, foreign keys, and ACID properties, making it highly reliable for transactional systems. However, it becomes unwieldy when dealing with highly interconnected data or variable-length relationship chains, as these require multiple joins that can impact both query complexity and performance. The model’s strength lies in its mathematical foundations and ability to efficiently handle complex queries involving aggregations and structured data analysis, but its rigid schema requirements and difficulty in representing hierarchical or network-like relationships can make it less suitable for domains where relationship patterns are as important as the data itself.

  1. Structured relationships through foreign keys and joins
  2. Strong schema enforcement and data consistency
  3. Excellent for aggregations, grouping, and statistical analysis
  4. Best for data with regular, predictable structure
  5. Complex relationship traversals require multiple joins, which can become expensive

Relational databases excel at:

  • Aggregating large amounts of structured data
  • Complex numerical computations
  • Enforcing data integrity constraints
  • Ad-hoc queries on well-defined schemas
  • ACID transactions

“Just because relational database management systems contain the word relational does not mean that they are good at handling relationships. The word relational in RDBMS stems from relational algebra and not from relationship” https://sonra.io/benefits-graph-databases-data-warehousing/

Property Graphs

Property graphs represent knowledge as a network of nodes (entities) and edges (relationships), where both can carry arbitrary key-value properties, making them naturally suited for domains with complex, interconnected data structures. This model, rooted in graph theory and discrete mathematics, excels at representing and traversing relationships of varying lengths and patterns without the need for expensive join operations. Unlike the rigid schema of relational databases, property graphs offer schema flexibility and intuitive modeling of real-world relationships, making them particularly effective for scenarios where relationship patterns, network effects, and transitive connections are central to the domain knowledge. While they may sacrifice some of the mathematical rigor and consistency guarantees of relational databases, they gain significant advantages in traversing complex relationship networks, finding patterns, and handling data where the connections between entities are as important as the entities themselves, though this flexibility can come at the cost of maintaining data consistency and performing bulk operations.

  1. Flexible schema where nodes and edges can have arbitrary properties
  2. Natural representation of many-to-many relationships
  3. Efficient for traversing connected data
  4. Strong for pattern matching across relationships
  5. Can represent hierarchical and network structures more intuitively

Let’s look at the types of queries where graphs particularly excel:

Path Finding and Distance

  • Finding shortest paths between entities
  • Detecting cycles in relationships
  • Computing degrees of separation
  • Example: “Find all people connected to John through at most 3 friendship relationships”
  • In SQL this would require multiple self-joins with uncertain depth

Pattern Matching

  • Finding subgraphs matching specific patterns
  • Detecting common relationship structures
  • Example: “Find all cases where A influences B who influences C who influences A back”
  • SQL would need complex recursive CTEs or multiple joins

Recommendation Engines

  • Finding similar nodes based on relationship patterns
  • Identifying clusters of related entities
  • Example: “Find products frequently bought together by customers who bought X”
  • Possible but more complex in SQL due to multiple relationship hops

Hierarchical Data

  • Traversing tree structures of unknown depth
  • Finding common ancestors
  • Example: “Find all departments under a given manager, at any level”
  • Requires recursive CTEs in SQL which can be less performant

Theoretical Foundations

For an academic perspective:

  1. Relational model has stronger theoretical foundations
  2. Graph model has more expressive power for certain queries
  3. Graph database theory is rapidly evolving
  4. Hybrid approaches are gaining theoretical attention

Performance

Performance characteristics differ significantly between SQL and graph databases, and the relative performance depends heavily on several factors:

Query Patterns:

  • SQL databases excel at large-scale aggregations, joins on indexed columns, and batch processing
  • Graph databases excel at traversing relationships, especially when the traversal depth is variable or unknown
  • The performance gap becomes more pronounced as the number of joins/hops increases

Published benchmarks suggest:

  • For 2-hop queries: SQL and graph performance might be comparable
  • For 3+ hop queries: Graph databases can be 10-100x faster
  • For variable depth traversals: Graph databases can be 100-1000x faster

SQL Databases:

  • Better for bulk loading and updates
  • Better for full table scans and aggregations
  • Index maintenance becomes crucial
  • Join performance degrades with scale
  • Memory requirements grow with join complexity

Graph Databases:

  • Better for traversing connected data
  • Index-free adjacency makes relationship traversal fast
  • Memory requirements grow with relationship density
  • Loading/update performance can be slower
  • Some graph databases keep relationships in memory

Summary

My general advice is that your default option should be RDBMS, due to its price/performance, support for transactions and scale. You move away from RDBMS if:

  1. Your schema is very dynamic, or likely to evolve in unpredictable ways and you don’t want/need a centralised schema management team
  2. The relationships between entities is at least as important as the entities themselves
  3. Your domain model is naturally a graph and your subject matter experts think in terms of graphs
  4. Your queries are frequently about the relationships between entities, particularly if the relationships could become arbitrarily complex (loops, chains, trees, taxonomies, friend-of-a-friend, clusters…)

If your data is large (billions of rows/nodes), in both cases you should think about data partitioning and scale-out across many servers.

In some cases a hybrid architecture may be warranted: using RDBMS for online transaction processing and a graph database for offline graph analytics.