Skip to main content

Graph Database

A Graph Database stores the knowledge graph constructed earlier. Unlike a vector database, it is optimized for representing and querying relationships. If we built a knowledge graph, the graph database is what allows the agent (or orchestrator) to retrieve information via structured queries or graph traversals whilst potentially complementing the vector DB.

Key aspects of a Graph Database are:

  • Nodes and Relationships Storage: The graph database maintains all nodes (entities) and edges (relations) with their properties. For example, in Memgraph (a popular graph DB), a node might have labels (types) and properties (attributes), and relationships have types and can also have properties. This directly reflects the knowledge graph we built. The graph can be queried using languages (Cypher for Memgraph, SPARQL for RDF stores, Gremlin for property graphs, etc.).

  • Querying by Relationship: A graph database shines when you ask questions like “find me all X that are related to Y through a path of type Z”. For instance: “Find all customers who bought ProductA and are in industry Finance”. In graph terms, that might be two hops: Customer -[PLACED]-> Order -[CONTAINS]-> Product (ProductA) and Customer -[WORKS_IN]-> Industry (Finance). In a graph database, that query can be executed precisely and return an exact set of nodes. This kind of query would be hard or imprecise with just vectors because it is deterministic and there's no similarity involved.

  • Reasoning and Constraints: Graphs allow for some reasoning. For example if the graph encodes hierarchies (“ProductA” is a subclass of “Electronics”), a query for “Electronics” could retrieve ProductA via traversal. It also enforces rules like getting connections that satisfy certain patterns. This exact matching and logical constraint satisfaction is something vector search isn’t built for. Therefore, for certain queries, an orchestrator might translate the user’s natural language into a graph query.

  • Performance Considerations: Graph databases can lag behind pure vector search when a query must crawl thousands of edges or perform complex joins, especially in massive, poorly indexed graphs. A broad “find the shortest multi-hop path” request on a large social-network graph, for instance, can devolve into an exhaustive breadth-first search that taxes. Traditional, disk-backed engines (like Neo4j’s Java stack) feel this pain most acutely. That caveat, however, isn’t universal. Graph databases engineered for in-memory, real-time workloads where the entire topology sits in RAM and traversal algorithms are tuned for cache locality excel at exactly these breadth-first and multi-hop lookups (for example Memgraph). So, when choosing a graph database, have in mind the performance aspect.

  • Combination with Embeddings: Interestingly, modern graph databases like Memgraph and some others have started to integrate vector search too (embedding properties on nodes). That means that they can do a hybrid search. First they would run a similarity search to find a starting set of candidate nodes and then do graph traversal to refine and dive into the details. Or, they can store text in the graph and use vector search to find a node by its description and then use the graph around that node to get related info. This combination is essentially Graph-enhanced RAG or GraphRAG. This way you can leverage both semantic and relational info.

In summary, the Graph Database is the structured memory of the Stack. It stores known facts in a way that the agent can retrieve with logical precision and relationship awareness. It complements the vector database, which represents a fuzzy memory. By using both, the system can answer both well-defined factual queries and open-ended ones more effectively. The decision to include a graph database comes down to the nature of queries expected.