Data Architecture & Engineering June 19, 2026 · 19 min read

What a Knowledge Graph Actually Is (and Is Not): From Tables to Triples to Meaning

Knowledge graphs sit at a junction of three different conversations: graph data models, ontologies, and identity. This article gives the precise definition, the two dominant paradigms (RDF and property graphs) compared honestly, and the four things a knowledge graph is not (graph database, knowledge base, semantic layer, vector store with relationships). Part 3 of the Knowledge Graph Practitioner's Guide.

By Vikas Pratap Singh

#knowledge-graph #rdf #property-graph #ontology #data-architecture #semantic-web

Executive Briefing

What this covers: The precise definition of a knowledge graph: a graph data model with stable global identity, schema-extensible typed relationships, and an associated vocabulary or ontology that gives the graph computable meaning. Plus the two dominant paradigms (RDF and property graph) compared honestly, and the four things that get called knowledge graphs but are not.
Who should read it: Anyone who has used the term 'knowledge graph' in a meeting and is not sure they could defend the definition under cross-examination. Anyone evaluating a tool that claims to be a knowledge graph platform. The article is the conceptual foundation for the rest of the series.
Key finding: A knowledge graph is not a graph database. A knowledge graph is not a semantic layer. A knowledge graph is not a knowledge base. The vocabulary collisions are not pedantic; using the wrong term shapes the wrong solution. After this article you will be able to defend the difference.
For practitioners: when a vendor pitch hinges on the word knowledge graph, ask three questions: (1) what is your stable identity model for entities, (2) what is your vocabulary or ontology layer, (3) does your data model carry edge-level properties natively. If they cannot answer all three, you are evaluating something other than a knowledge graph.

Knowledge Graph Practitioner’s Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10 | Part 11a | Part 11b | Part 11c | Appendix A | Appendix B | Appendix C | Part 12

The Definition Most Senior Engineers Cannot Quite Give

I have lost count of the architecture reviews where someone says “knowledge graph” with total confidence and then cannot define it when a VP asks. The pattern repeats: the lead architect starts with “It is a graph database, but with…” and trails off; the data scientist offers “It is like a semantic layer, but…” and stops; the platform engineer says “It is RDF and ontologies and…” and looks uncertain. The room discovers it has been agreeing on a word, not a thing. I treat that moment as the real starting point of any knowledge graph conversation.

Each was partly right. None had the full definition. None could defend the answer against the next question.

This article is the conceptual core of the series. By the end you will have a working definition you can defend, the two dominant paradigms (RDF and property graphs) compared honestly, and a clear picture of the four things that get called knowledge graphs but are not. The vocabulary you locked in Part 1 (entities, typed relationships, identity, inference) gets formalized here.

We are tool-agnostic in this article and the rest of Parts 4 through 8. Specific vendors live in Appendix A. The query languages SPARQL and Cypher show up below as illustrative paradigm syntax, not as product pitches. The distinction is made explicitly in the comparison section.

Working Definition

A knowledge graph is a graph-structured representation of entities, relationships, and properties, governed by a vocabulary or ontology that gives the graph computable meaning, where every entity has a stable global identity that outlives any one source schema.

That sentence has four load-bearing pieces:

Graph-structured: nodes and edges, not tables and joins. Graph operations like traversal, path-finding, and neighborhood expansion are first-class.
Entities, relationships, properties: the three primitives. An entity is a node; a relationship is a directed labeled edge; a property is a key-value attribute on a node or an edge.
Vocabulary or ontology: a layer above the data that defines what entity types exist, what relationships are allowed, and what constraints apply. Without this, you have a graph; with it, you have knowledge.
Stable global identity: every entity has an identifier that uniquely refers to one real-world thing, consistently, across sources, schemas, and time.

This is not a definition I made up. It is consistent with the working definition in the Hogan et al. survey of knowledge graphs (ACM Computing Surveys, 2021), which writes: “A knowledge graph’s data (also known as a data graph) conforms to a graph-based data model, which may be a directed edge-labelled graph, a heterogeneous graph, a property graph, and so on… [where] ontologies and rules can be used to define and reason about the semantics of the terms used in the graph.”

The same survey notes a property that turns out to matter for failure mode 1 from Part 2: “Graphs allow maintainers to postpone the definition of a schema, allowing the data to evolve in a more flexible manner.” This is a feature when applied with discipline. It is also the rope projects hang themselves with.

From Rows to Triples: The Structural Shift

The simplest way to understand a knowledge graph is to see how it represents the same fact a relational database holds.

A row in a customer table:

| customer_id | name        | industry    | country | acquisition_date |
| 12345       | Acme Corp   | Manufacturing | US      | 2023-04-15       |

The same information as a knowledge graph (RDF flavor, in Turtle syntax):

:Acme a :Customer ;
      :name "Acme Corp" ;
      :inIndustry :Manufacturing ;
      :locatedIn :UnitedStates ;
      :acquiredOn "2023-04-15"^^xsd:date .

The same information as a property graph (in Cypher pattern syntax):

(:Customer {id: 12345, name: "Acme Corp", acquisitionDate: "2023-04-15"})
  -[:IN_INDUSTRY]-> (:Industry {name: "Manufacturing"})
(:Customer {id: 12345}) -[:LOCATED_IN]-> (:Country {name: "United States"})

Three things change in the move from row to graph.

First, values become entities. “Manufacturing” was a string in a column. In the graph, it is its own node. Now you can attach properties to Manufacturing (a description, an industry hierarchy, a regulatory profile) and ask questions across customers (“how many customers do we have in Manufacturing in the US in 2023?”) that were always possible but more naturally expressed as graph traversals.

Second, relationships become first-class. The fact that Acme is located in the United States is no longer an opaque string in a country column. It is a typed edge. Other nodes can have the same locatedIn relationship to United States. United States can be queried as a node with all its incoming edges (every customer, employee, supplier, regulation that “lives” in it).

Third, identity becomes global. Acme’s customer_id 12345 is local to the customer table. Acme’s IRI in the knowledge graph is :Acme (or, more precisely, something like https://lakeside.com/kg/customer/12345), which is unique across all systems, all schemas, all time. Two systems referring to “Acme” can verify they mean the same entity by comparing IRIs.

We will return to identity in Part 5. For now, notice that the three changes (values become entities, relationships become first-class, identity becomes global) are what give a graph more expressive power than a table for connected data. None of them require any specific technology.

A side-by-side visualization of the same customer fact represented three ways: a relational row, an RDF triple set in Turtle syntax, and a property graph fragment in Cypher syntax. The three formats hold the same information; the graph formats make relationships and identity explicit while the relational row leaves them implicit.

Paradigm One: RDF Triples

The RDF (Resource Description Framework) paradigm is the older of the two and is the W3C standard for knowledge representation on the web.

The atomic unit of RDF is the triple: a three-part statement of the form Subject-Predicate-Object. Per the W3C RDF 1.1 specification, an RDF triple “denotes a proposition: a simple logical expression, describing a relationship between two entities.” The Subject is an IRI (a globally unique identifier); the Predicate is an IRI naming a relationship type; the Object is an IRI, a blank node, or a literal value.

A small set of triples about Acme:

:Acme       rdf:type           :Customer .
:Acme       :name              "Acme Corp" .
:Acme       :inIndustry        :Manufacturing .
:Acme       :locatedIn         :UnitedStates .
:Manufacturing  rdf:type       :Industry .
:Manufacturing  rdfs:subClassOf  :IndustrySector .

Several things to notice. Every entity (:Acme, :Manufacturing, :UnitedStates) is identified by an IRI. Every relationship (rdf:type, :inIndustry, :locatedIn, rdfs:subClassOf) is also an IRI. The rdf:type predicate is what makes a Customer or an Industry a typed thing. The rdfs:subClassOf predicate is what makes Manufacturing a kind of IndustrySector.

The paradigm is graph-flat: the entire graph is a set of triples. There are no rows, no tables, no nesting. Every fact is a triple. This uniformity makes RDF strong on standardization, federation across data sources, and integration with vocabularies and ontologies (RDFS, OWL, SHACL all sit on top of RDF). It is the basis for the Semantic Web and the Linked Data movement that Tim Berners-Lee proposed in 2006.

The query language for RDF is SPARQL, which expresses queries as triple patterns:

SELECT ?customer ?industry WHERE {
  ?customer rdf:type :Customer .
  ?customer :inIndustry ?industry .
  ?industry rdfs:subClassOf :IndustrySector .
}

This finds all customers grouped by industry, where the industry is a kind of IndustrySector. The pattern matches against any subgraph that satisfies it.

What this looks like in practice: every node in an RDF graph that exists is fully described by the set of triples whose subject it is. A Customer entity is whatever triples have that customer’s IRI in the subject position. There is no “row.” This is liberating for federation (you can pull triples from many sources and just merge them) and demanding for ergonomics (representing edge-level metadata requires reification or RDF*).

The trade-off RDF makes is to put expressive power into the schema layer (RDFS, OWL, SHACL) and to keep the data layer uniform. This is great when you want to share definitions across organizations, federate across sources, and reason rigorously. It is less ergonomic when you want to attach four properties to a single edge (when did Alice start working at Acme, in what title, with what contract type, and with what manager).

Paradigm Two: Property Graphs

The property graph paradigm (sometimes called LPG, for Labeled Property Graph) is the newer paradigm and dominates commercial graph databases.

In a property graph, nodes carry one or more labels (categories) and a set of properties (key-value pairs). Edges are typed (each edge has a relationship type) and carry their own properties. Both nodes and edges are first-class objects.

The same Acme fact in property graph syntax (Cypher patterns):

CREATE (acme:Customer {id: 12345, name: "Acme Corp", acquisitionDate: "2023-04-15"})
CREATE (mfg:Industry {name: "Manufacturing"})
CREATE (us:Country {name: "United States"})
CREATE (acme)-[:IN_INDUSTRY {since: "2023-04-15"}]->(mfg)
CREATE (acme)-[:LOCATED_IN {primaryAddress: true}]->(us)

The differences from RDF are immediate.

The Customer label is on the node, not a separate rdf:type triple. The properties (id, name, acquisitionDate) sit on the node directly. The edges have their own properties: the IN_INDUSTRY edge carries a since date; the LOCATED_IN edge carries a primaryAddress flag.

This is more compact for many real workloads. According to a comparison from Memgraph: “Nodes have IDs and key-value pairs/attributes; edges have types and attributes natively, making LPG more dense, compact, and informative compared to RDF.” Whether this is an advantage depends on what you are doing. For graph traversal and Graph Data Science, the density helps. For federated semantic integration, RDF’s uniformity helps more.

The query language for property graphs is Cypher (and increasingly GQL, published as ISO/IEC 39075 in April 2024, which generalizes Cypher patterns):

MATCH (c:Customer)-[:IN_INDUSTRY]->(i:Industry)
WHERE i.name = "Manufacturing"
RETURN c.name, c.acquisitionDate

This finds customer names and acquisition dates for customers in the Manufacturing industry. The traversal is explicit; the schema is implicit (Cypher will match whatever shape the data has, with optional schema enforcement).

The trade-off property graphs make is the inverse of RDF’s. They put expressive power into the data layer (edges with properties, multi-label nodes) and ask less rigor of the schema layer. They are more pragmatic, less philosophically committed, and most modern graph databases support some form of property graph. As Enterprise Knowledge summarizes: “LPGs support ad-hoc schema design, allowing developers to iterate quickly and adapt their data model as requirements change.”

RDF vs Property Graph: A Side-by-Side Honest Comparison

This is the comparison most articles get wrong, either by partisan advocacy or by hand-waving “they’re equivalent.” They are not equivalent. They optimize for different priorities.

Dimension	RDF	Property graph (LPG)
Atomic unit	Triple (Subject-Predicate-Object)	Node and edge, both first-class
Identity	IRIs, globally unique by design	Node IDs, local to the database (unless mapped to global IDs)
Edge-level properties	Require reification or RDF*; not native	Native; first-class
Schema philosophy	Defined separately (RDFS, OWL, SHACL)	Often implicit in the data; optional formal schema
Inference / reasoning	First-class via OWL profiles, SHACL validation	Typically external; less standardized
Federation across sources	Strong (uniform format, IRIs as universal IDs)	Weaker (requires ID mapping)
Query language	SPARQL (W3C standard)	Cypher (openCypher), GQL (ISO standard 2024), Gremlin
Tooling maturity for ontology design	Mature (dedicated ontology editors and OWL profiles; see Appendix A for the specific tools)	Less mature (typically vendor-specific)
Performance for graph traversal	Generally slower due to triple expansion	Generally faster for highly-connected workloads
Standards compliance	W3C standards for everything	ISO GQL is new (2024); historically vendor-driven
Best-fit use cases	Ontology-heavy, federated, Data Governance, regulatory	Graph traversal, Graph Data Science, ad-hoc analytics
Worst-fit use cases	Ad-hoc graphs needing edge-level properties	Cross-organization federation with shared vocabularies

A common practitioner observation, surfaced in Enterprise Knowledge’s analysis, is that the two paradigms increasingly complement rather than compete: “RDF is used for managing ontologies, taxonomies, standards, data quality, and governance, while LPG is used for graph traversal and graph data science applications.” Many large organizations now run both, with translation layers (R2RML, RDF to LPG mappings, and RDF-star in newer specifications).

For the rest of this series, we are paradigm-agnostic. When a specific syntax appears, we will note which paradigm it is. The reference architecture for Lakeside Trust Bank in Part 11 will use a hybrid approach because that is what most large organizations actually deploy.

For practitioners: when you choose between RDF and property graphs, the right question is not “which is better.” It is: “what does my use case need first?” If federation across organizations or rigorous reasoning over a shared vocabulary is the first need, RDF. If high-throughput graph traversal for a single-organization use case is the first need, property graph. If you cannot answer this question, do not pick a vendor yet.

What a Knowledge Graph Is Not

This is the part that prevents the most expensive vendor mistakes. The vocabulary collisions are not pedantic. Each of the four below is a real category that gets called a knowledge graph in marketing material, and each is something different.

Not a graph database

A graph database is a storage engine optimized for graph workloads. It holds nodes and edges. It does fast traversal. That is its job.

A knowledge graph is what you can build on top of a graph database (or a triple store, or sometimes even a relational database) when you add the three things storage alone does not give you: a vocabulary or ontology, a stable global identity model, and inference rules.

The test: take any graph database in your organization. Ask whether the entities have stable global IDs that outlive any one app. Ask whether there is a documented vocabulary defining what types and relationships exist. Ask whether new facts can be derived from existing facts via rules. If the answer to any of the three is no, you have a graph database, not a knowledge graph.

Not a knowledge base

The term knowledge base predates knowledge graph by decades. Classic knowledge bases (1970s-1990s expert systems, frame-based representations, production-rule systems) were structured knowledge stores that were not graphs. They could be tree-structured, frame-structured, or rule-based.

In modern usage, “knowledge base” sometimes loosely means “the document corpus an LLM retrieves from,” which is closer to a document store than a structured knowledge representation. Every knowledge graph is a knowledge base in the broad sense. Most things people now call knowledge bases are not knowledge graphs.

The test: a knowledge base may or may not have typed relationships, may or may not have global identity, may or may not have inference. A knowledge graph requires all three. If the system stores Q&A pairs or markdown documents indexed by topic, it is a knowledge base, not a knowledge graph.

Not a semantic layer

This is one of the most common confusions in 2026 and the most consequential. A semantic layer (capability-level examples include dbt Semantic Layer, Cube, AtScale, and the Open Semantic Interchange (OSI), a vendor-neutral semantic-model specification whose v1.0 was finalized in January 2026) standardizes business definitions: what is a customer, what is revenue, what is monthly recurring revenue. It sits between data sources and consumers (BI tools, dashboards, increasingly AI agents).

A knowledge graph captures how entities relate, not just what they mean. As Galaxy summarizes the distinction: “A semantic layer standardizes what your data means, while a knowledge graph captures how your data relates.”

A semantic layer can tell you that “Customer LTV” is (total revenue minus total refunds) divided by months since first purchase. It cannot tell you that Customer X is a subsidiary of Company Y, which is in the same industry as Company Z, which churned last quarter. The semantic layer answers questions about definitions; the knowledge graph answers questions about relationships.

In practice many enterprises end up needing both. They are complementary, not substitutes. The semantic layer makes BI accurate; the knowledge graph makes AI grounded. The question for your organization is which problem you have first.

We will return to this distinction in Part 10 when we discuss the relationship between KGs and existing Data Governance investments. The short version: if your organization already has a working semantic layer, that is half the prerequisite for a useful KG.

Not a vector store with relationship metadata

Vector stores (dedicated vector databases such as Pinecone, Weaviate, and Chroma, plus the vector capabilities now built into general-purpose platforms like Snowflake, Databricks, and Postgres) are databases optimized for vector similarity search. They hold dense embeddings of text, images, or other content, and answer “what is most similar to this query vector?”

Some vector stores let you attach relationship metadata to vectors. This does not make them knowledge graphs. The relationship metadata in a vector store is associative (which vectors are tagged with what), not structurally traversable (you cannot ask the multi-hop question “find me documents related to documents related to documents related to X” with the same primitives).

The test: ask the system “give me all entities reachable from X by following relationship type Y for up to three hops, where the intermediate entities have property Z.” A vector store cannot answer this natively. A knowledge graph can.

We will return to this in Part 9 when we cover GraphRAG, the integration pattern that combines vector retrieval and graph traversal in a single agent retrieval pipeline. The two systems work better together than either alone.

A categorical diagram showing what a knowledge graph is and is not. The center shows the four required components (entities, typed relationships, identity, inference) and the surrounding categories (graph database, knowledge base, semantic layer, vector store) each show what they are missing.

The Four-Part Lens, Refined

In Part 1 we introduced a four-part lens for evaluating any knowledge graph: entities, typed relationships, identity, and inference. With the formal definitions above, we can sharpen each component.

Component	Working definition	Where it gets formalized
Entities	Things in the world represented as nodes, with properties attached	Part 4 (entity types as ontology classes)
Typed relationships	Directed, named edges with semantics, optionally with properties	Part 4 (relationship types in the ontology)
Identity	Globally unique stable identifiers, typically IRIs in RDF or external IDs mapped to internal node IDs in LPG	Part 5 (identity, IRIs, entity resolution)
Inference	Derivation of new facts from existing facts using rules, ontology axioms, or learned models	Part 5 (inference, materialization, reasoning)

A graph database gives you nodes and edges. A knowledge graph gives you nodes and edges plus a vocabulary that names them, plus identity that anchors them, plus inference that extends them. The next two articles (Part 4 on ontology, taxonomy, and schema and Part 5 on identity and inference) cover the design choices for the second and third components in depth.

What You Should Now Be Able to Do

If you read this article cold, you should now be able to:

Give a defensible one-sentence definition of a knowledge graph.
Explain why a relational row, an RDF triple, and a property graph node-edge-pair represent the same fact at different levels of expressiveness.
Compare RDF and property graph paradigms on the dimensions of identity, edge properties, schema philosophy, and federation, without partisan bias toward either.
Distinguish a knowledge graph from a graph database, a knowledge base, a semantic layer, and a vector store with relationship metadata, with one specific test for each.
Apply the four-part lens (entities, typed relationships, identity, inference) to evaluate any system claimed to be a knowledge graph.

What you cannot yet do is design one. That requires the design vocabulary in Part 4 (ontology, taxonomy, schema) and the runtime semantics in Part 5 (identity, IRIs, inference). The next two articles equip you for that.

Do Next

Priority	Action	Why it matters
This week	Take any knowledge-graph-shaped system in your organization (a graph database, a metadata store, a customer 360, a semantic layer). Apply the four-part lens. Which components are present? Which are missing?	You will likely find that what your organization calls a KG is missing one or two of the four. The missing components are where the next investment goes.
This week	Read the Hogan et al. Knowledge Graphs survey at least through section 3 (Data Graphs). It is the most authoritative single source for the formal definitions. The vocabulary you locked here will hold across the entire field.	Aligning your team on a single canonical source short-circuits months of definitional arguments.
This month	Pick a single target application in your organization and sketch the same fact in three notations: a relational row, an RDF triple set in Turtle, and a property graph in Cypher. The exercise reveals which paradigm fits your team’s instincts and your data shape.	Picking a paradigm before doing this exercise is how organizations end up regretting the choice in year two.
This month	If your organization has both a semantic layer and a graph effort underway, document the boundary. Which questions does the semantic layer answer (definitions, metrics, KPIs)? Which questions does the graph answer (relationships, traversals, multi-hop reasoning)?	The boundary is real but not always visible. Drawing it explicitly prevents both teams from claiming the same problem space.
This quarter	Read Parts 4 and 5 of this series before designing your first ontology or making your first identity decision. The two articles together prevent the most expensive design mistakes.	Vocabulary and identity are the load-bearing decisions. They are also the ones most projects defer until too late.

Part 4 of this series, “Ontology, Taxonomy, Schema: The Vocabulary That Makes Knowledge Possible,” covers the design vocabulary every KG leader needs. Read it next.