What a Knowledge Graph Actually Is (and Is Not): From Tables to Triples to Meaning
Knowledge graphs sit at a junction of three different conversations: graph data models, ontologies, and identity. This article gives the precise definition, the two dominant paradigms (RDF and property graphs) compared honestly, and the four things a knowledge graph is not (graph database, knowledge base, semantic layer, vector store with relationships). Part 3 of the Knowledge Graph Practitioner's Guide.
Knowledge Graph Practitioner’s Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10 | Part 11a | Part 11b | Part 11c | Appendix A | Appendix B | Appendix C | Part 12
The Definition Most Senior Engineers Cannot Quite Give
I have lost count of the architecture reviews where someone says “knowledge graph” with total confidence and then cannot define it when a VP asks. The pattern repeats: the lead architect starts with “It is a graph database, but with…” and trails off; the data scientist offers “It is like a semantic layer, but…” and stops; the platform engineer says “It is RDF and ontologies and…” and looks uncertain. The room discovers it has been agreeing on a word, not a thing. I treat that moment as the real starting point of any knowledge graph conversation.
Each was partly right. None had the full definition. None could defend the answer against the next question.
This article is the conceptual core of the series. By the end you will have a working definition you can defend, the two dominant paradigms (RDF and property graphs) compared honestly, and a clear picture of the four things that get called knowledge graphs but are not. The vocabulary you locked in Part 1 (entities, typed relationships, identity, inference) gets formalized here.
We are tool-agnostic in this article and the rest of Parts 4 through 8. Specific vendors live in Appendix A. The query languages SPARQL and Cypher show up below as illustrative paradigm syntax, not as product pitches. The distinction is made explicitly in the comparison section.
Working Definition
A knowledge graph is a graph-structured representation of entities, relationships, and properties, governed by a vocabulary or ontology that gives the graph computable meaning, where every entity has a stable global identity that outlives any one source schema.
That sentence has four load-bearing pieces:
- Graph-structured: nodes and edges, not tables and joins. Graph operations like traversal, path-finding, and neighborhood expansion are first-class.
- Entities, relationships, properties: the three primitives. An entity is a node; a relationship is a directed labeled edge; a property is a key-value attribute on a node or an edge.
- Vocabulary or ontology: a layer above the data that defines what entity types exist, what relationships are allowed, and what constraints apply. Without this, you have a graph; with it, you have knowledge.
- Stable global identity: every entity has an identifier that uniquely refers to one real-world thing, consistently, across sources, schemas, and time.
This is not a definition I made up. It is consistent with the working definition in the Hogan et al. survey of knowledge graphs (ACM Computing Surveys, 2021), which writes: “A knowledge graph’s data (also known as a data graph) conforms to a graph-based data model, which may be a directed edge-labelled graph, a heterogeneous graph, a property graph, and so on… [where] ontologies and rules can be used to define and reason about the semantics of the terms used in the graph.”
The same survey notes a property that turns out to matter for failure mode 1 from Part 2: “Graphs allow maintainers to postpone the definition of a schema, allowing the data to evolve in a more flexible manner.” This is a feature when applied with discipline. It is also the rope projects hang themselves with.
From Rows to Triples: The Structural Shift
The simplest way to understand a knowledge graph is to see how it represents the same fact a relational database holds.
A row in a customer table:
| customer_id | name | industry | country | acquisition_date |
| 12345 | Acme Corp | Manufacturing | US | 2023-04-15 |
The same information as a knowledge graph (RDF flavor, in Turtle syntax):
:Acme a :Customer ;
:name "Acme Corp" ;
:inIndustry :Manufacturing ;
:locatedIn :UnitedStates ;
:acquiredOn "2023-04-15"^^xsd:date .
The same information as a property graph (in Cypher pattern syntax):
(:Customer {id: 12345, name: "Acme Corp", acquisitionDate: "2023-04-15"})
-[:IN_INDUSTRY]-> (:Industry {name: "Manufacturing"})
(:Customer {id: 12345}) -[:LOCATED_IN]-> (:Country {name: "United States"})
Three things change in the move from row to graph.
First, values become entities. “Manufacturing” was a string in a column. In the graph, it is its own node. Now you can attach properties to Manufacturing (a description, an industry hierarchy, a regulatory profile) and ask questions across customers (“how many customers do we have in Manufacturing in the US in 2023?”) that were always possible but more naturally expressed as graph traversals.
Second, relationships become first-class. The fact that Acme is located in the United States is no longer an opaque string in a country column. It is a typed edge. Other nodes can have the same locatedIn relationship to United States. United States can be queried as a node with all its incoming edges (every customer, employee, supplier, regulation that “lives” in it).
Third, identity becomes global. Acme’s customer_id 12345 is local to the customer table. Acme’s IRI in the knowledge graph is :Acme (or, more precisely, something like https://lakeside.com/kg/customer/12345), which is unique across all systems, all schemas, all time. Two systems referring to “Acme” can verify they mean the same entity by comparing IRIs.
We will return to identity in Part 5. For now, notice that the three changes (values become entities, relationships become first-class, identity becomes global) are what give a graph more expressive power than a table for connected data. None of them require any specific technology.
Paradigm One: RDF Triples
The RDF (Resource Description Framework) paradigm is the older of the two and is the W3C standard for knowledge representation on the web.
The atomic unit of RDF is the triple: a three-part statement of the form Subject-Predicate-Object. Per the W3C RDF 1.1 specification, an RDF triple “denotes a proposition: a simple logical expression, describing a relationship between two entities.” The Subject is an IRI (a globally unique identifier); the Predicate is an IRI naming a relationship type; the Object is an IRI, a blank node, or a literal value.
A small set of triples about Acme:
:Acme rdf:type :Customer .
:Acme :name "Acme Corp" .
:Acme :inIndustry :Manufacturing .
:Acme :locatedIn :UnitedStates .
:Manufacturing rdf:type :Industry .
:Manufacturing rdfs:subClassOf :IndustrySector .
Several things to notice. Every entity (:Acme, :Manufacturing, :UnitedStates) is identified by an IRI. Every relationship (rdf:type, :inIndustry, :locatedIn, rdfs:subClassOf) is also an IRI. The rdf:type predicate is what makes a Customer or an Industry a typed thing. The rdfs:subClassOf predicate is what makes Manufacturing a kind of IndustrySector.
The paradigm is graph-flat: the entire graph is a set of triples. There are no rows, no tables, no nesting. Every fact is a triple. This uniformity makes RDF strong on standardization, federation across data sources, and integration with vocabularies and ontologies (RDFS, OWL, SHACL all sit on top of RDF). It is the basis for the Semantic Web and the Linked Data movement that Tim Berners-Lee proposed in 2006.
The query language for RDF is SPARQL, which expresses queries as triple patterns:
SELECT ?customer ?industry WHERE {
?customer rdf:type :Customer .
?customer :inIndustry ?industry .
?industry rdfs:subClassOf :IndustrySector .
}
This finds all customers grouped by industry, where the industry is a kind of IndustrySector. The pattern matches against any subgraph that satisfies it.
What this looks like in practice: every node in an RDF graph that exists is fully described by the set of triples whose subject it is. A Customer entity is whatever triples have that customer’s IRI in the subject position. There is no “row.” This is liberating for federation (you can pull triples from many sources and just merge them) and demanding for ergonomics (representing edge-level metadata requires reification or RDF*).
The trade-off RDF makes is to put expressive power into the schema layer (RDFS, OWL, SHACL) and to keep the data layer uniform. This is great when you want to share definitions across organizations, federate across sources, and reason rigorously. It is less ergonomic when you want to attach four properties to a single edge (when did Alice start working at Acme, in what title, with what contract type, and with what manager).
Paradigm Two: Property Graphs
The property graph paradigm (sometimes called LPG, for Labeled Property Graph) is the newer paradigm and dominates commercial graph databases.
In a property graph, nodes carry one or more labels (categories) and a set of properties (key-value pairs). Edges are typed (each edge has a relationship type) and carry their own properties. Both nodes and edges are first-class objects.
The same Acme fact in property graph syntax (Cypher patterns):
CREATE (acme:Customer {id: 12345, name: "Acme Corp", acquisitionDate: "2023-04-15"})
CREATE (mfg:Industry {name: "Manufacturing"})
CREATE (us:Country {name: "United States"})
CREATE (acme)-[:IN_INDUSTRY {since: "2023-04-15"}]->(mfg)
CREATE (acme)-[:LOCATED_IN {primaryAddress: true}]->(us)
The differences from RDF are immediate.
The Customer label is on the node, not a separate rdf:type triple. The properties (id, name, acquisitionDate) sit on the node directly. The edges have their own properties: the IN_INDUSTRY edge carries a since date; the LOCATED_IN edge carries a primaryAddress flag.
This is more compact for many real workloads. According to a comparison from Memgraph: “Nodes have IDs and key-value pairs/attributes; edges have types and attributes natively, making LPG more dense, compact, and informative compared to RDF.” Whether this is an advantage depends on what you are doing. For graph traversal and Graph Data Science, the density helps. For federated semantic integration, RDF’s uniformity helps more.
The query language for property graphs is Cypher (and increasingly GQL, published as ISO/IEC 39075 in April 2024, which generalizes Cypher patterns):
MATCH (c:Customer)-[:IN_INDUSTRY]->(i:Industry)
WHERE i.name = "Manufacturing"
RETURN c.name, c.acquisitionDate
This finds customer names and acquisition dates for customers in the Manufacturing industry. The traversal is explicit; the schema is implicit (Cypher will match whatever shape the data has, with optional schema enforcement).
The trade-off property graphs make is the inverse of RDF’s. They put expressive power into the data layer (edges with properties, multi-label nodes) and ask less rigor of the schema layer. They are more pragmatic, less philosophically committed, and most modern graph databases support some form of property graph. As Enterprise Knowledge summarizes: “LPGs support ad-hoc schema design, allowing developers to iterate quickly and adapt their data model as requirements change.”
RDF vs Property Graph: A Side-by-Side Honest Comparison
This is the comparison most articles get wrong, either by partisan advocacy or by hand-waving “they’re equivalent.” They are not equivalent. They optimize for different priorities.
| Dimension | RDF | Property graph (LPG) |
|---|---|---|
| Atomic unit | Triple (Subject-Predicate-Object) | Node and edge, both first-class |
| Identity | IRIs, globally unique by design | Node IDs, local to the database (unless mapped to global IDs) |
| Edge-level properties | Require reification or RDF*; not native | Native; first-class |
| Schema philosophy | Defined separately (RDFS, OWL, SHACL) | Often implicit in the data; optional formal schema |
| Inference / reasoning | First-class via OWL profiles, SHACL validation | Typically external; less standardized |
| Federation across sources | Strong (uniform format, IRIs as universal IDs) | Weaker (requires ID mapping) |
| Query language | SPARQL (W3C standard) | Cypher (openCypher), GQL (ISO standard 2024), Gremlin |
| Tooling maturity for ontology design | Mature (dedicated ontology editors and OWL profiles; see Appendix A for the specific tools) | Less mature (typically vendor-specific) |
| Performance for graph traversal | Generally slower due to triple expansion | Generally faster for highly-connected workloads |
| Standards compliance | W3C standards for everything | ISO GQL is new (2024); historically vendor-driven |
| Best-fit use cases | Ontology-heavy, federated, Data Governance, regulatory | Graph traversal, Graph Data Science, ad-hoc analytics |
| Worst-fit use cases | Ad-hoc graphs needing edge-level properties | Cross-organization federation with shared vocabularies |
A common practitioner observation, surfaced in Enterprise Knowledge’s analysis, is that the two paradigms increasingly complement rather than compete: “RDF is used for managing ontologies, taxonomies, standards, data quality, and governance, while LPG is used for graph traversal and graph data science applications.” Many large organizations now run both, with translation layers (R2RML, RDF to LPG mappings, and RDF-star in newer specifications).
For the rest of this series, we are paradigm-agnostic. When a specific syntax appears, we will note which paradigm it is. The reference architecture for Lakeside Trust Bank in Part 11 will use a hybrid approach because that is what most large organizations actually deploy.
For practitioners: when you choose between RDF and property graphs, the right question is not “which is better.” It is: “what does my use case need first?” If federation across organizations or rigorous reasoning over a shared vocabulary is the first need, RDF. If high-throughput graph traversal for a single-organization use case is the first need, property graph. If you cannot answer this question, do not pick a vendor yet.
What a Knowledge Graph Is Not
This is the part that prevents the most expensive vendor mistakes. The vocabulary collisions are not pedantic. Each of the four below is a real category that gets called a knowledge graph in marketing material, and each is something different.
Not a graph database
A graph database is a storage engine optimized for graph workloads. It holds nodes and edges. It does fast traversal. That is its job.
A knowledge graph is what you can build on top of a graph database (or a triple store, or sometimes even a relational database) when you add the three things storage alone does not give you: a vocabulary or ontology, a stable global identity model, and inference rules.
The test: take any graph database in your organization. Ask whether the entities have stable global IDs that outlive any one app. Ask whether there is a documented vocabulary defining what types and relationships exist. Ask whether new facts can be derived from existing facts via rules. If the answer to any of the three is no, you have a graph database, not a knowledge graph.
Not a knowledge base
The term knowledge base predates knowledge graph by decades. Classic knowledge bases (1970s-1990s expert systems, frame-based representations, production-rule systems) were structured knowledge stores that were not graphs. They could be tree-structured, frame-structured, or rule-based.
In modern usage, “knowledge base” sometimes loosely means “the document corpus an LLM retrieves from,” which is closer to a document store than a structured knowledge representation. Every knowledge graph is a knowledge base in the broad sense. Most things people now call knowledge bases are not knowledge graphs.
The test: a knowledge base may or may not have typed relationships, may or may not have global identity, may or may not have inference. A knowledge graph requires all three. If the system stores Q&A pairs or markdown documents indexed by topic, it is a knowledge base, not a knowledge graph.
Not a semantic layer
This is one of the most common confusions in 2026 and the most consequential. A semantic layer (capability-level examples include dbt Semantic Layer, Cube, AtScale, and the Open Semantic Interchange (OSI), a vendor-neutral semantic-model specification whose v1.0 was finalized in January 2026) standardizes business definitions: what is a customer, what is revenue, what is monthly recurring revenue. It sits between data sources and consumers (BI tools, dashboards, increasingly AI agents).
A knowledge graph captures how entities relate, not just what they mean. As Galaxy summarizes the distinction: “A semantic layer standardizes what your data means, while a knowledge graph captures how your data relates.”
A semantic layer can tell you that “Customer LTV” is (total revenue minus total refunds) divided by months since first purchase. It cannot tell you that Customer X is a subsidiary of Company Y, which is in the same industry as Company Z, which churned last quarter. The semantic layer answers questions about definitions; the knowledge graph answers questions about relationships.
In practice many enterprises end up needing both. They are complementary, not substitutes. The semantic layer makes BI accurate; the knowledge graph makes AI grounded. The question for your organization is which problem you have first.
We will return to this distinction in Part 10 when we discuss the relationship between KGs and existing Data Governance investments. The short version: if your organization already has a working semantic layer, that is half the prerequisite for a useful KG.
Not a vector store with relationship metadata
Vector stores (dedicated vector databases such as Pinecone, Weaviate, and Chroma, plus the vector capabilities now built into general-purpose platforms like Snowflake, Databricks, and Postgres) are databases optimized for vector similarity search. They hold dense embeddings of text, images, or other content, and answer “what is most similar to this query vector?”
Some vector stores let you attach relationship metadata to vectors. This does not make them knowledge graphs. The relationship metadata in a vector store is associative (which vectors are tagged with what), not structurally traversable (you cannot ask the multi-hop question “find me documents related to documents related to documents related to X” with the same primitives).
The test: ask the system “give me all entities reachable from X by following relationship type Y for up to three hops, where the intermediate entities have property Z.” A vector store cannot answer this natively. A knowledge graph can.
We will return to this in Part 9 when we cover GraphRAG, the integration pattern that combines vector retrieval and graph traversal in a single agent retrieval pipeline. The two systems work better together than either alone.
The Four-Part Lens, Refined
In Part 1 we introduced a four-part lens for evaluating any knowledge graph: entities, typed relationships, identity, and inference. With the formal definitions above, we can sharpen each component.
| Component | Working definition | Where it gets formalized |
|---|---|---|
| Entities | Things in the world represented as nodes, with properties attached | Part 4 (entity types as ontology classes) |
| Typed relationships | Directed, named edges with semantics, optionally with properties | Part 4 (relationship types in the ontology) |
| Identity | Globally unique stable identifiers, typically IRIs in RDF or external IDs mapped to internal node IDs in LPG | Part 5 (identity, IRIs, entity resolution) |
| Inference | Derivation of new facts from existing facts using rules, ontology axioms, or learned models | Part 5 (inference, materialization, reasoning) |
A graph database gives you nodes and edges. A knowledge graph gives you nodes and edges plus a vocabulary that names them, plus identity that anchors them, plus inference that extends them. The next two articles (Part 4 on ontology, taxonomy, and schema and Part 5 on identity and inference) cover the design choices for the second and third components in depth.
What You Should Now Be Able to Do
If you read this article cold, you should now be able to:
- Give a defensible one-sentence definition of a knowledge graph.
- Explain why a relational row, an RDF triple, and a property graph node-edge-pair represent the same fact at different levels of expressiveness.
- Compare RDF and property graph paradigms on the dimensions of identity, edge properties, schema philosophy, and federation, without partisan bias toward either.
- Distinguish a knowledge graph from a graph database, a knowledge base, a semantic layer, and a vector store with relationship metadata, with one specific test for each.
- Apply the four-part lens (entities, typed relationships, identity, inference) to evaluate any system claimed to be a knowledge graph.
What you cannot yet do is design one. That requires the design vocabulary in Part 4 (ontology, taxonomy, schema) and the runtime semantics in Part 5 (identity, IRIs, inference). The next two articles equip you for that.
Do Next
| Priority | Action | Why it matters |
|---|---|---|
| This week | Take any knowledge-graph-shaped system in your organization (a graph database, a metadata store, a customer 360, a semantic layer). Apply the four-part lens. Which components are present? Which are missing? | You will likely find that what your organization calls a KG is missing one or two of the four. The missing components are where the next investment goes. |
| This week | Read the Hogan et al. Knowledge Graphs survey at least through section 3 (Data Graphs). It is the most authoritative single source for the formal definitions. The vocabulary you locked here will hold across the entire field. | Aligning your team on a single canonical source short-circuits months of definitional arguments. |
| This month | Pick a single target application in your organization and sketch the same fact in three notations: a relational row, an RDF triple set in Turtle, and a property graph in Cypher. The exercise reveals which paradigm fits your team’s instincts and your data shape. | Picking a paradigm before doing this exercise is how organizations end up regretting the choice in year two. |
| This month | If your organization has both a semantic layer and a graph effort underway, document the boundary. Which questions does the semantic layer answer (definitions, metrics, KPIs)? Which questions does the graph answer (relationships, traversals, multi-hop reasoning)? | The boundary is real but not always visible. Drawing it explicitly prevents both teams from claiming the same problem space. |
| This quarter | Read Parts 4 and 5 of this series before designing your first ontology or making your first identity decision. The two articles together prevent the most expensive design mistakes. | Vocabulary and identity are the load-bearing decisions. They are also the ones most projects defer until too late. |
Part 4 of this series, “Ontology, Taxonomy, Schema: The Vocabulary That Makes Knowledge Possible,” covers the design vocabulary every KG leader needs. Read it next.
Sources & References
- Knowledge Graphs (Hogan et al., ACM Computing Surveys 2021)(2021)
- W3C RDF 1.1 Concepts and Abstract Syntax(2014)
- W3C RDF 1.1 Primer(2014)
- openCypher Specification(2024)
- Memgraph: LPG vs RDF(2024)
- Enterprise Knowledge: Cutting Through the Noise on RDF and LPG(2024)
- Neo4j: RDF Triple Stores vs Property Graphs(2024)
- Galaxy: RAG vs Knowledge Graph vs Semantic Layer(2026)
- Atlan: Ontology vs Semantic Layer(2026)
- Tim Berners-Lee, Linked Data Design Issues(2006)
- ISO/IEC 39075:2024 Information technology - Database languages - GQL(2024)
- Snowflake: Open Semantic Interchange Specs Finalized (OSI v1.0)(2026)
Stay in the loop
Get new articles on data governance, AI, and engineering delivered to your inbox.
No spam. Unsubscribe anytime.