Data Architecture & Engineering June 19, 2026 · 36 min read

The Knowledge Graph Tool and Technology Landscape: An Honest Vendor Map for 2026

Appendix A of the Knowledge Graph Practitioner's Guide. The first place in the series vendor names appear in earnest. Maps the seven layers of a production KG stack (triple stores, property graph stores, hybrid stores, virtualization, entity resolution, LLM extraction, governance metadata) onto the 2026 vendor landscape, names Lakeside Trust Bank's pick at each layer with rationale, gives a triple-vs-property-vs-hybrid decision tree, walks through what changed between 2025 and 2026 (the Ontotext-SWC merger, the SAP-Reltio acquisition, AWS Neptune Analytics' GenAI track, Stardog Voicebox, Senzing v4, the Microsoft GraphRAG cost reckoning, the rise of LightRAG and Graphiti), enumerates six vendor-selection failure modes, and closes with an eight-question vendor diagnostic and a tiered Do Next table.

By Vikas Pratap Singh
#knowledge-graph #vendor-landscape #triple-store #property-graph #entity-resolution #graphrag #openlineage #financial-services

Knowledge Graph Practitioner’s Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10 | Part 11a | Part 11b | Part 11c | Appendix A | Appendix B | Appendix C | Part 12

Why This Appendix Exists, And Why Vendor Names Appear Here And Nowhere Else

Parts 1 through 11c of this series have been deliberately tool-agnostic. SPARQL and Cypher appeared as paradigm syntax, FIBO and PROV-O appeared as standards, and the Lakeside Trust Bank capstone described a stack at the layer level (a triple store with a property-graph view, an entity-resolution service, a SHACL gate, an OpenLineage bridge, an LLM-extraction pipeline, an episodic memory layer) without naming the vendors that occupy each role. That choice was load-bearing for the rest of the series: a paradigm-level guide that dates well beats a vendor-level guide that ages in a year. This appendix is the exception. It names vendors, prices ranges where public, lists 2025-to-2026 changes, and ends with the picks Lakeside actually made.

The rest of the series will keep the layer vocabulary and avoid vendor names. The appendix exists because procurement decisions cannot be made at the paradigm level, and a guide that ducks the question of who actually does what in 2026 is one a Staff-plus practitioner cannot use to lead a stack conversation. Read this appendix as a 2026 snapshot. The categories will outlast the names; the names will keep moving.

The Procurement Trap That Started The Conversation

Before Lakeside Trust Bank, the series’ hypothetical worked-example bank, chose its stack, the bank’s CDO ran the procurement-by-default play that had worked for the warehouse migration two years earlier. The architecture group issued an RFP for “an enterprise knowledge graph platform.” Five vendors responded. Each demo lasted ninety minutes and ended with a node-edge browser, a curated graph of forty entities and a hundred edges, and an LLM-powered Q and A pane that answered “who owns the Müller-family beneficial-ownership chain” in three seconds against the demo data. The steering committee had a favorite by demo three.

What stopped the play was a question the head of governance asked in the last meeting. She had been reading Part 7 and Part 11b during the pilot, and she asked the vendor to show the SHACL ShapeGraph that had validated the demo data, the OpenLineage events that had generated the loaded triples, the entity-resolution method per merged record, and the named-graph version chain across the last three ontology releases. The vendor’s answer covered the SHACL gate (a pre-built shape catalog, not the bank’s). It did not cover OpenLineage (the platform expected to be the lineage authority, not a downstream consumer of OL events the bank already emitted). It did not cover entity resolution (the demo had been pre-resolved by the vendor). It did not cover named-graph versioning (the demo was a single graph at one point in time).

The CDO killed the procurement, rewrote the RFP at the layer level (storage, entity resolution, extraction, virtualization, governance metadata, agent memory), reissued it as six smaller scoping exercises, and ran each as a separate evaluation against the bank’s actual ontology, its actual lineage emission, and its actual reading patterns. The first procurement would have spent $2.4M in the first year, locked the bank to one vendor’s syntax, and produced a fancy browser without a production consumer. The second produced the stack the rest of the trilogy describes. The lesson is the framing of this appendix: the category is “knowledge graph platform”; the procurement decision is at the layer.

The Seven Layers Of A Production KG Stack

A production knowledge graph runs on seven layers. Most procurement conversations collapse them into one or two. Pulling them apart is what makes the vendor map readable.

LayerWhat it doesWhat standard or paradigm anchors itCommon 2026 vendors
1. Storage (triple)Persists RDF triples or quads, executes SPARQL, supports OWL or SHACLW3C RDF, SPARQL, SHACL, OWLStardog, Ontotext GraphDB (Graphwise), AllegroGraph, Virtuoso, AWS Neptune
2. Storage (property)Persists labeled property graphs, executes Cypher or GSQL or AQLopenCypher, GQL ISO/IEC 39075, GSQL, AQLNeo4j, TigerGraph, Memgraph, ArangoDB, AWS Neptune (openCypher)
3. Hybrid and multi-modelStores both paradigms or stores graph plus vectors plus documentsMixed; vendor-specificStardog (RDF plus virtual graph), AllegroGraph (RDF plus vector plus JSON), Neo4j (LPG plus vector index), Neptune Analytics (RDF plus openCypher plus HNSW)
4. VirtualizationTranslates SPARQL or Cypher into pushed-down SQL or remote queries; no materializationW3C R2RML, Direct Mapping, OBDAOntop, Stardog Virtual Graph, Denodo, Dremio (federation), Trino (federation)
5. Entity resolutionResolves source records into one canonical entity per real-world thing; supplies coreference links with method and scoreProbabilistic ER literature; ER-as-a-service APIsSenzing, Reltio (SAP acquisition announced Mar 2026, pending close), Tamr, Zingg, Splink
6. LLM extraction (Track 2)Extracts triples from unstructured text against a fixed ontology; returns structured records with provenanceNo formal standard yet; converging on LangChain plus ontology-grounded promptingMicrosoft GraphRAG, LightRAG, iText2KG, Graphiti, FalkorDB GraphRAG SDK, LangChain GraphTransformer
7. Governance metadataCatalog, lineage, glossary, quality, policy register; emits or consumes OpenLineage; bridges to the operational graphOpenLineage, DCAT, dqv, ODCSMarquez, DataHub, Atlan, Collibra, OpenMetadata, Apache Atlas

A production KG at a regulated mid-size bank touches at least four of these (storage, entity resolution, extraction, governance metadata); a mature deployment touches all seven. The Lakeside trilogy described a stack at every layer. This appendix names what occupies each role.

What this looks like in practice. When a vendor positions as “the knowledge graph platform,” ask which of the seven layers the platform owns and which it expects to integrate with. A platform that owns storage plus extraction but expects you to bring entity resolution, governance metadata, and the OpenLineage bridge is a partial product; that is fine, as long as the procurement decision treats it as a partial. A platform that owns five or six layers is selling a stack; the decision is whether you want one vendor across that many layers, with the lock-in that implies. The trap is buying one platform under the assumption that “knowledge graph” means all seven, then discovering at month nine that the bank’s OpenLineage emitters need a separate metadata store and the entity-resolution roadmap is two quarters out.

A diagram showing the seven-layer KG stack as horizontal bars stacked vertically. Bottom layer in slate labeled "Storage (triple)" with example vendors "Stardog | GraphDB | AllegroGraph | Virtuoso | AWS Neptune" listed inside. Second layer in slate labeled "Storage (property)" with vendors "Neo4j | TigerGraph | Memgraph | ArangoDB | AWS Neptune (openCypher)". Third layer in deep teal labeled "Hybrid and multi-model" with vendors "Stardog | AllegroGraph | Neo4j (vector index) | Neptune Analytics". Fourth layer in deep teal labeled "Virtualization" with vendors "Ontop | Stardog VG | Denodo | Dremio | Trino". Fifth layer in deep blue labeled "Entity resolution" with vendors "Senzing | Reltio (SAP acquisition pending) | Tamr | Zingg | Splink". Sixth layer in amber labeled "LLM extraction (Track 2)" with vendors "Microsoft GraphRAG | LightRAG | iText2KG | Graphiti | FalkorDB GraphRAG SDK | LangChain". Top layer in deep red labeled "Governance metadata" with vendors "Marquez | DataHub | Atlan | Collibra | OpenMetadata | Apache Atlas". To the right, a vertical bracket labeled "Procurement decision is per layer; the category 'KG platform' is per stack" connects all seven layers. Below the layers, a small grey legend reads "vendors listed are 2026 instances of layer roles; layer vocabulary outlasts vendor names." Caption: "the seven layers of a production KG stack; pick layer-by-layer, not platform-by-platform."

Triple Stores: RDF-Native And Built For Inference

Triple stores persist RDF (subject-predicate-object) or quads (subject-predicate-object-named-graph), execute SPARQL queries, and support OWL or SHACL inference. The category is mature; the leading 2026 vendors are Stardog, Ontotext GraphDB (now part of Graphwise after the October 2024 merger with the Austrian Semantic Web Company), AllegroGraph (Franz), Virtuoso (OpenLink), and AWS Neptune. Each has its own positioning.

Stardog leads on virtualization. The Stardog Virtual Graph layer pushes SPARQL down to relational sources without materialization. Stardog Voicebox added an LLM-powered conversational layer that translates natural language to SPARQL against the loaded ontology. The platform’s positioning is “semantic AI platform” with strong support for both materialized and virtual graphs in one engine.

Ontotext GraphDB (Graphwise) leads on inference at scale. GraphDB is one of the few triplestores that performs OWL and rule-based inference on billion-triple workloads in real time. The October 2024 merger with the Semantic Web Company combined GraphDB with PoolParty (a thesaurus and taxonomy management platform), expanding the offering toward semantic content management.

AllegroGraph leads on multi-model. The 2025-2026 versions added native vector storage and JSON document support alongside RDF, positioning AllegroGraph as a single store for graph plus vector plus document workloads. ACID, replication, horizontal sharding, and triple-level security are mature.

Virtuoso leads on scale. The reference Virtuoso instance has served 35.5B+ triples on a multi-host shared-nothing cluster in production for years, and the engine remains among the fastest on diverse SPARQL workloads.

AWS Neptune has the cloud advantage. Neptune supports SPARQL on RDF and openCypher on property graphs in one engine, and Neptune Analytics adds vector indexing with HNSW up to 65,000 dimensions for GraphRAG workloads. The trade-off is the usual cloud one: spend for managed scale; lose some control over inference profile and SHACL execution behavior.

Picking dimensions for a triple store: ontology size and inference profile (Stardog or GraphDB if you need OWL plus SHACL plus reasoning at scale; Neptune if you can defer most reasoning to query time); virtualization need (Stardog VG if a substantial portion of the graph stays in the warehouse); cloud posture (Neptune if AWS-only and willing to pay for managed; Stardog or GraphDB for cloud-and-on-prem flexibility); and governance fit (does the vendor read your existing OpenLineage emission, or does it expect to be the lineage authority).

Property Graph Stores: LPG-Native And Built For Traversal

Property graph stores persist labeled property graphs (nodes with labels, edges with types, both with key-value properties), execute Cypher or GSQL or AQL, and excel at multi-hop traversal. The leading 2026 vendors are Neo4j, TigerGraph, Memgraph, and ArangoDB.

Neo4j is the category default. The 2026 product line includes AuraDB (managed cloud across Free, Professional, Business Critical, and Virtual Dedicated Cloud tiers), Graph Data Science (algorithms library), and the native vector index that turns Neo4j into a hybrid graph plus vector store. AuraDB pricing for managed cloud runs roughly $65 to $146 per GB per month at list across professional and business-critical tiers, with self-managed enterprise contracts commonly reported in the $20,000 to $200,000+ annual range (an estimate, not a published list figure) depending on scale. Neo4j announced on June 3, 2026 that it would acquire GraphAware, an intelligence-analysis platform positioned as an alternative to Palantir Gotham, with the deal expected to close in Q3 2026, continuing the consolidation pattern at the analytics edge.

TigerGraph leads on large-scale analytics. The proprietary GSQL is a Turing-complete graph language optimized for parallel algorithms, and TigerGraph is engineered for trillion-edge workloads. The company took a strategic investment and went private under Cuadrilla Capital on July 15, 2025, reaffirming its place in the analytics-edge segment.

Memgraph leads on real-time. The in-memory architecture is Cypher-compatible and built for sub-millisecond latency on streaming workloads; the typical deployments are in cybersecurity and fraud detection.

ArangoDB leads on multi-model in the property graph half of the market. ArangoDB stores property graphs alongside documents and key-value pairs, and the AQL query language spans all three. The trade-off is that the multi-model breadth comes at some cost on pure graph workloads compared to graph-only stores.

Picking dimensions for a property graph store: scale (TigerGraph for trillion-edge analytics; Neo4j or Memgraph for billion-edge; ArangoDB if multi-model is more important than peak graph performance); latency (Memgraph for sub-millisecond; Neo4j for typical enterprise; TigerGraph for batch and parallel analytics); language familiarity (Cypher is the openCypher path and the basis for the GQL ISO/IEC 39075 standard; GSQL is TigerGraph-specific; AQL is ArangoDB-specific); and ecosystem maturity (Neo4j has the deepest tooling and community; the others are catching up at different rates).

Hybrid And Multi-Model Stores: When You Need Both

A 2026 reality is that most enterprise KGs are hybrid: RDF for the canonical operational and governance substrate, property graph for the traversal-heavy long tail, vector indexes for similarity-augmented retrieval, document storage for the immutable source artifacts. The hybrid layer is where vendor positioning is most fluid.

Stardog has positioned for hybrid since the Voicebox release: RDF storage, virtual graphs over relational sources, an LLM layer, and a property-graph view through the Stardog property graph adapter. AllegroGraph added native vectors and JSON documents in the 2025-2026 versions, becoming a single-store option for graph plus vector plus document workloads. Neo4j added a native vector index and built Graphiti (separately, via Zep) as an agent-memory layer that combines Cypher with vector retrieval. AWS Neptune Analytics is the most aggressive hybrid: SPARQL plus openCypher plus HNSW vector search plus Bedrock Knowledge Bases GraphRAG integration plus the February 2026 GenAI Agents prototyping samples.

The picking question for hybrid is whether to buy one store that covers all paradigms or to compose two specialized stores with a serving-tier abstraction. Lakeside chose the second (Stardog plus a Neo4j property-graph view, with vector retrieval handled at the agent layer rather than the storage layer); a smaller team would more reasonably pick one hybrid store and accept the per-paradigm trade-offs.

Virtualization Layers: Federate Without Materializing

Not every triple needs to land in the graph store. Virtualization translates SPARQL or Cypher queries into pushed-down SQL against the source systems, returning live results without copying data. The category is small but stable, and it is a common cost saver when a substantial portion of the graph would otherwise be a duplicate of the warehouse.

Ontop is the open-source reference implementation. Version 5.5.0 released February 14, 2026 supports the majority of SPARQL 1.1 features against R2RML or OBDA mappings, with backends including PostgreSQL, MySQL, Oracle, SQL Server, Snowflake, BigQuery, Databricks, plus federators (Denodo, Dremio, Trino, Spark). Stardog’s Virtual Graph is a commercial counterpart with deeper enterprise integration. Denodo is a general-purpose data virtualization platform that exposes SPARQL endpoints over its federated layer; Dremio and Trino are query federators that can be wrapped by Ontop for SPARQL access.

The picking question for virtualization is which sources stay virtual. Lakeside virtualized the customer master (the Reltio MDM stayed authoritative; Stardog VG translated SPARQL queries into Reltio API calls plus Snowflake joins) and materialized the resolved counterparty entities and the derived exposures. The general rule: virtualize the slow-changing reference data and the structured records that already have a system of record; materialize the resolved entities, the inferred facts, and anything Track 2 produces.

Entity Resolution: The Layer That Makes Identity Real

Entity resolution is the layer where most KG programs underspend and most KG failures originate. Part 5 made the case that ER moved from nightly batch to real-time on the serving path between 2024 and 2026. The 2026 vendor landscape reflects that shift, and is reshaping fast under SAP’s announced March 2026 acquisition of Reltio.

Senzing is the real-time-API leader. The v4 SDK released in 2025 made streaming entity resolution with continuous self-correction the default; the April 2026 Kiro power for agentic IDE extends Senzing into agentic-development workflows, exposing ER as an MCP server that agents can summon. The positioning is unambiguous: ER as a service, not as a feature of an MDM suite.

Reltio is the cloud-native MDM-with-ER leader, and the March 2026 SAP acquisition (expected to close in Q2 or Q3 2026) is the most consequential MDM-and-ER consolidation event in years. For SAP shops, the deal accelerates a single AI-ready data plane; for non-SAP shops running Reltio (like Lakeside, which uses Reltio for retail customer master), the deal raises a roadmap question that procurement teams should be asking now.

Tamr is the human-in-the-loop ML leader. The platform’s positioning is enterprise MDM with active learning; deployments at Toyota, GSK, and Roche anchor the case studies.

Zingg and Splink are the open-source options. Zingg is a Python-and-Spark library for probabilistic matching; Splink is a UK-government-origin probabilistic ER library that scales to billions of records on Spark or DuckDB. Both are appropriate for cost-conscious teams or for embedding ER inside a custom pipeline.

The picking question for ER is the latency requirement and the integration shape. Real-time agent retrieval and serving-edge use cases require an API like Senzing v4. Batch MDM with golden-record management benefits from Reltio or Tamr. Custom pipelines or embedded ER lean toward Zingg or Splink. Lakeside picked Senzing for its real-time API on the serving edge (the relationship-banker agent’s tier-aware retrieval depends on resolved IRIs at sub-200ms latency) and continued running Reltio for retail customer master, with the SAP-Reltio roadmap risk explicitly on the procurement watch list.

KEY INSIGHT: The most underbudgeted layer in a 2026 KG procurement is entity resolution. A graph store without an ER layer is a fancy join; an extraction pipeline without an ER layer is a triple-explosion machine. Senzing prices on annual tiers by Data Source Record (the input records mapped and loaded), running roughly $58.6K at 10M records up to about $3.4M at 10B records, so this layer carries a material per-record license cost that the storage-led RFP routinely misses.

LLM Extraction: Sourcing Track 2 Without Triple Explosion

Part 6 introduced the Track 2 problem: extracting triples from unstructured text against a fixed ontology, with three discipline points (fixed ontology in prompt, dedup and ER between extract and assert, SHACL gate before write). The 2026 LLM-extraction landscape has matured fast, and the cost positioning is the most consequential change since the 2024 GraphRAG release.

Microsoft GraphRAG is the academic reference. The Edge et al. 2024 paper introduced community-detection-driven hierarchical summarization, and the open-source implementation produces strong query-focused summarization on small-to-medium corpora. The reckoning was the cost: at scale, the indexing pass on a large corpus runs into the tens of thousands of dollars, and the run-time cost is sensitive to community granularity.

LightRAG is the cost-conscious alternative. LightRAG strips the community-detection step and uses dual-mode retrieval (graph plus vector); independent reporting puts it at comparable quality to GraphRAG on most workloads at roughly 1/100th the token cost, with GraphRAG retaining the edge on relational sensemaking. Latency is also lower against a standard RAG baseline (~80ms vs ~120ms, about a 30 percent reduction). For most enterprise Track 2 corpora, LightRAG is the more defensible choice on TCO; the GraphRAG advantage shows on relational sensemaking workloads where the community structure is the value.

iText2KG is the ontology-first extraction library. The framing is closest to the discipline points from Part 6: fixed ontology in the prompt, deterministic mapping between LLM output and triples, and explicit dedup. iText2KG is appropriate when the ontology is stable and the corpus is moderate in size; the trade-off is that the library does less of the “discover the schema from the text” work than GraphRAG does.

Graphiti (Zep) is the agent-memory specialist. Graphiti accumulates knowledge from agent interactions in real time, maintains time-aware bi-temporal facts, and exposes a Cypher API to the operational graph. On the LongMemEval benchmark, Zep’s own benchmarking reported 63.8 percent for Zep against 49.0 percent for Mem0 on GPT-4o (Zep-reported, corroborated by third-party comparison), making Graphiti the leading 2026 choice for episodic agent memory specifically.

FalkorDB shipped GraphRAG SDK 1.0 in early 2026, ranking first on GraphRAG-Bench across all four task types in the company’s announcement. FalkorDB’s positioning is fast graph plus vector retrieval at low operational cost; the platform is a credible 2026 alternative to the Microsoft and LightRAG frames.

LangChain’s GraphTransformer remains the most widely deployed glue layer. It is not a research-grade extraction stack, and it is not a benchmark winner, but it is the path of least resistance for teams already on LangChain that want a Track 2 prototype against an ontology in one or two weeks.

The picking question for Track 2 is the corpus shape, the cost ceiling, and the integration target. Lakeside picked LightRAG-style dual-mode retrieval for the credit-memo and KYC corpora (cost-conscious, ontology-grounded) and Graphiti for the agent’s episodic memory layer (real-time accumulation, bi-temporal). The bank ran a six-week internal evaluation against five extraction stacks before locking the picks; the result is in the Part 11a Track 2 description.

Governance Metadata Stores: Lineage And Catalog As Graph

Part 10 and Part 11b made the case that lineage, catalog, glossary, quality, and policy register can collapse onto one substrate when each is modeled as a node-and-edge view of the same governance graph. The 2026 vendor landscape is split between platforms that are knowledge graphs internally (DataHub, Atlan) and platforms that are catalog or lineage tools that expose graph APIs (Collibra, OpenMetadata, Apache Atlas).

Marquez is the OpenLineage reference implementation. Marquez consumes OL events from Airflow, Spark, dbt, Flink, and Dagster, stores them in a relational backend, and exposes a graph API for lineage queries. Marquez’s strength is being downstream of the OL emitters that already exist in most modern stacks; it is not a catalog or a glossary. The 2026 release added an observability dashboard (24-hour and 7-day stats, sources and datasets and jobs views).

DataHub (originally LinkedIn, now Acryl Data) is a metadata-graph-from-the-start platform. The architecture has always modeled datasets, pipelines, dashboards, and lineage as a graph; the 2026 product surface adds AI-powered discovery and an MCP server for agentic queries. DataHub’s strength is the graph model and the open-source ecosystem; the trade-off is the operational burden of running it self-managed (Acryl Data offers a managed option).

Atlan leads the 2026 commercial-and-cloud catalog category. Atlan’s active-metadata architecture treats catalog, lineage, glossary, quality, and policy as one substrate; the 2026 release added native MCP server support and a GraphQL API that AI agents can query directly. Atlan’s strength is the UX and the connector breadth (100+ connectors); the trade-off is cost.

Collibra is the legacy data-governance leader and the most common Atlan-replacement target in 2026 procurement. Collibra’s stewardship and policy workflows are mature; the platform is configuration-heavy and not graph-native; many programs are migrating off Collibra to Atlan or DataHub when the catalog and lineage become AI-and-agent surfaces rather than manual stewardship surfaces.

OpenMetadata and Apache Atlas are the open-source alternatives. OpenMetadata is the more active project; Apache Atlas is older, Hadoop-era, but still in production at many banks.

The picking question for governance metadata is whether the bank wants the catalog, lineage, glossary, quality, and policy register on one substrate, and whether the platform can read OpenLineage emission rather than replace it. Lakeside picked Marquez for the OL ingestion bridge (the bank’s pipelines were already emitting OL events into Marquez before the KG program started) and Atlan for the catalog and policy UX, with the OL events flowing from Marquez into the governance KG via the PROV-O bridge from Part 11b.

The Triple-Vs-Property-Vs-Hybrid Decision Tree

The most-asked procurement question is which storage paradigm to pick. The decision is not a personality test on the architecture team. It is a function of three variables: the regulatory and audit posture, the multi-hop traversal pattern, and the team’s existing skills. The decision tree below is the one Lakeside used.

QuestionAnswerStorage paradigm
Are you in a regulated industry (banking, insurance, pharma, healthcare, government) where ontology, inference, and named-graph provenance are audit requirements?YesTriple store (RDF) plus a property-graph view if traversal needs warrant; do not start with a property-graph-only stack
Same as aboveNoProperty graph store; consider triple store later if a governance graph emerges
Is your dominant query pattern multi-hop traversal where the depth is unbounded (fraud rings, transaction flow, supply chain, social network analysis)?YesProperty graph store (Neo4j, TigerGraph) for the traversal layer; consider RDF for governance metadata only
Same as aboveNoTriple store can handle most enterprise patterns; SPARQL with property paths is sufficient for 1-to-3-hop queries
Do you need W3C-standard inference (OWL EL/QL/RL or SHACL inference) at write time or query time?YesTriple store (Stardog, GraphDB) is the right choice; property graph stores do not natively support OWL or SHACL
Same as aboveNoEither paradigm works; pick on team skills and traversal pattern
Do you have a substantial unstructured corpus (credit memos, KYC files, advisor notes, clinical notes) feeding Track 2 extraction?YesHybrid (triple store for canonical facts; property-graph view for traversal; LLM extraction discipline regardless)
Same as aboveNoPick on the dominant query pattern and inference need
Does your team have existing SPARQL skills, a semantic-web background, or RDF tooling investment?YesTriple store is the lower-friction path; do not retrain on Cypher unless traversal patterns warrant
Same as aboveNoProperty graph store is the lower-friction path for new teams; Cypher and the GQL standard are easier to learn than SPARQL

The decision tree should be run end-to-end before vendor demos start. A team that lands at “triple store with a property-graph view” should not be sitting through a property-graph-only vendor’s pitch as the first option. A team that lands at “property graph for traversal, RDF for governance” should not be sitting through a single-store-everything pitch.

A diagram showing the triple-vs-property-vs-hybrid decision tree as a top-down flowchart. Top node in slate labeled "Are you in a regulated industry where ontology and inference are audit requirements?" with two branches: "yes" leading down-right and "no" leading down-left. The "yes" branch leads to a deep-teal node labeled "Triple store (RDF) is the canonical substrate; property-graph view if traversal warrants" which branches to a sub-node labeled "Multi-hop traversal pattern dominant?" with "yes" leading to a deep-blue node "Hybrid: RDF + property-graph view (Lakeside pattern)" and "no" leading to a slate node "Triple store only (Stardog or GraphDB or AllegroGraph or Virtuoso)". The "no" branch from the top leads to an amber node labeled "Property graph (Neo4j or TigerGraph or Memgraph or ArangoDB) for the operational graph" which branches to a sub-node labeled "Governance graph emerging?" with "yes" leading to a deep-blue node "Add RDF substrate for governance metadata only" and "no" leading to a slate node "Property graph only; revisit when governance graph emerges". To the right of the flowchart, a sidebar labeled "Reading the tree" with three notes: (1) "Do not skip the regulated-industry question; OWL and SHACL inference are not equivalent to property-graph constraints", (2) "The 'hybrid' branch is the most common 2026 enterprise outcome at scale", (3) "Team-skill questions enter as a tiebreaker, not as the lead criterion". A small grey legend at the bottom reads "Lakeside ran this tree before the first vendor demo; the answer was hybrid." Caption: "the storage-paradigm decision belongs at the start of the procurement, not at the vendor-demo conclusion."

What Changed 2025 To 2026: A Vendor Timeline

The 2026 vendor landscape is a different shape than the 2025 one. Six events reshaped the assumptions in last year’s procurement plans, and any RFP issued before late 2025 should be revisited against this list.

QuarterEventWhy it matters
2024 Q4Ontotext plus Semantic Web Company merge to form GraphwiseCombines GraphDB with PoolParty; one of the largest semantic-tech consolidations; affects pricing, roadmap, and SI partner lists
2024-2025Stardog Voicebox launches; LightRAG paper releasedLLM-plus-KG conversational layer becomes table stakes; the “GraphRAG cost reckoning” begins as LightRAG demonstrates comparable quality at roughly 1 percent of the token cost
2025 Q3Senzing v4 SDK ships with continuous real-time ER and self-correctionReal-time-on-serving-path ER becomes a procurable category, not a custom build; affects every program with sub-200ms agent retrieval latency
2025 Q3TigerGraph takes a strategic investment and goes private under Cuadrilla Capital (July 15, 2025)Reaffirms TigerGraph’s place in the analytics-edge segment; validates large-scale graph analytics as an investable category
2026 FebAWS releases Sample GenAI Agents on Neptune; Neptune Analytics doubles down on RDF plus openCypher plus HNSWNeptune becomes a credible single-store hybrid for AWS-only workloads; reshapes the procurement question for cloud-native programs
2026 MarSAP announces acquisition of ReltioLargest 2026 MDM-and-ER consolidation; for non-SAP shops on Reltio, raises a roadmap question that procurement teams should be asking now
2026 AprSenzing launches Kiro Power: ER as an MCP server for agentic IDEER becomes summonable by agents in development workflows; precedent for other layer roles being exposed as MCP powers
2026 JunNeo4j announces acquisition of GraphAware (intelligence-analysis platform)Continues Neo4j’s analytics consolidation; extends the platform’s reach into graph analysis tooling

What this looks like in practice. The 2026 consolidation pattern is the same one that played out in cloud DBMS five years earlier: the larger platforms acquire the analytics edge and the specialized capabilities, the open-source projects continue to be the cost-conscious base, and the standards bodies (W3C for RDF, GQL ISO/IEC 39075 for property graphs, OpenLineage for governance) absorb the integration surface. The right procurement posture is to pick layer-by-layer, prefer standards-anchored vendors, and explicitly model the M&A risk on a three-year horizon.

A diagram showing the 2025-to-2026 KG vendor consolidation timeline as a horizontal arrow from 2024 Q4 on the left to 2026 Q2 on the right, with milestone markers spaced along the arrow. Each milestone is a small vertical pin attached to the arrow with a callout box. Pin 1 in slate "2024 Q4: Ontotext+SWC=Graphwise" with subtext "GraphDB plus PoolParty consolidation". Pin 2 in slate "2024-2025: Stardog Voicebox; LightRAG paper" with subtext "LLM+KG conversational layer; GraphRAG cost reckoning begins". Pin 3 in deep teal "2025 Q3: Senzing v4 SDK; TigerGraph goes private (Cuadrilla)" with subtext "Real-time ER as a procurable category; analytics-edge investment". Pin 4 in deep teal "2026 Feb: Neptune Analytics GenAI; Sample GenAI Agents" with subtext "Cloud-native hybrid stack". Pin 5 in amber "2026 Mar: SAP announces Reltio acquisition; Neo4j to acquire GraphAware (Jun 2026)" with subtext "Largest 2026 MDM consolidation; roadmap risk for non-SAP shops". Pin 6 in amber "2026 Apr: Senzing Kiro Power MCP" with subtext "ER summonable by agents in dev workflows". Below the timeline, a horizontal band labeled "What this means for your RFP" with three sub-bands: "Standards-anchored vendors aged best (W3C RDF, openCypher, OpenLineage)", "Bundled platforms reshuffle ownership; price and lock-in change", "MCP server adapters become a third procurement criterion alongside price and capability". Caption: "six events reshaped the 2026 KG vendor map; any RFP issued before late 2025 should be revisited against this list."

The Lakeside Stack: Six Layers With Rationale

Lakeside’s stack pick is the worked example for the rest of this appendix. The picks are listed at the layer level, with the rationale that survived the second-round procurement after the original “knowledge graph platform” RFP was killed.

LayerLakeside pickRationale
Storage (canonical)Stardog (triple store plus virtual graph)OWL plus SHACL inference at scale; virtual-graph push-down to Snowflake reduces materialization; semantic AI platform positioning aligns with the agent layer; the bank’s existing FIBO-based ontology imports cleanly
Storage (traversal)Neo4j (property-graph view, federated from Stardog)Multi-hop counterparty-group traversal (the long tail; 1,400-entity counterparty trees) is materially faster on Neo4j than via SPARQL property paths; Neo4j’s native vector index is unused at the storage layer (vectors live with the agent)
Entity resolutionSenzing v4 (real-time API on serving edge); Reltio (retail customer master, with SAP-roadmap risk on watch list)Senzing’s sub-200ms latency on the serving path is what the agent’s tier-aware retrieval depends on; Reltio is the legacy retail master that SAP’s acquisition introduces a roadmap question for
LLM extraction (Track 2)LightRAG-style dual-mode retrieval for credit memos and KYC; Graphiti for episodic agent memoryCost reckoning made Microsoft GraphRAG indefensible at Lakeside’s corpus size; LightRAG holds comparable quality at affordable TCO; Graphiti’s bi-temporal facts and LongMemEval scores justify the agent-memory pick
VirtualizationStardog Virtual Graph (commercial); Ontop in dev for non-Stardog sourcesStardog VG covers the production warehouse path; Ontop covers internal experimentation against Postgres and Spark; both speak R2RML so mappings are portable
Governance metadataMarquez (OL bridge); Atlan (catalog plus policy UX)Marquez is the existing OL emission target; Atlan is the catalog UX layer; the OL events flow Marquez→PROV-O→governance KG via the bridge from Part 11b

The picks above are 2026-current. Two of them have explicit M&A watch flags (Reltio under SAP; the LLM-extraction stack under continued vendor turnover); the procurement runbook revisits the picks at every quarterly architecture review. Other firms’ picks will differ on cloud posture, regulatory shape, and team skills; the layer map is the artifact, not the brand list.

A diagram showing Lakeside's full stack pick as six horizontal layers stacked vertically, with each layer labeled by role and vendor. Bottom layer in slate labeled "Storage (canonical): Stardog | rationale: OWL+SHACL inference at scale; virtual-graph push-down". Second layer in slate labeled "Storage (traversal): Neo4j | rationale: multi-hop traversal on long-tail counterparty groups". Third layer in deep teal labeled "Entity resolution: Senzing v4 (serving edge) + Reltio (retail master, SAP-roadmap risk) | rationale: sub-200ms latency for tier-aware agent retrieval". Fourth layer in deep teal labeled "LLM extraction (Track 2): LightRAG-style + Graphiti (agent memory) | rationale: GraphRAG cost reckoning; LongMemEval scores". Fifth layer in deep blue labeled "Virtualization: Stardog VG (prod) + Ontop (dev) | rationale: R2RML portability; production push-down". Top layer in amber labeled "Governance metadata: Marquez (OL bridge) + Atlan (catalog UX) | rationale: existing OL emission; PROV-O bridge from Part 11b". To the right of the stack, a vertical bracket labeled "Two M&A watch flags: Reltio under SAP; LLM-extraction stack under continued turnover. Procurement runbook revisits at every quarterly architecture review." Below the layers, a small grey legend reads "picks are 2026-current; the layer map is the artifact; the vendor names are 2026 instances." Caption: "Lakeside's six-layer pick with rationale; other firms will pick differently on cloud posture, regulatory shape, and team skills."

Six Failure Modes In Vendor Selection

The patterns below recur across the procurement post-mortems this series has aggregated. Each is a diagnostic for whether the second-round Lakeside conversation needs to happen at your firm.

  1. Picking the storage paradigm at the demo, not at the decision tree. The triple-vs-property-vs-hybrid decision should run before any vendor demo. Teams that skip the tree end up buying the paradigm of the most polished demo, then retrofitting the ontology to the engine.
  2. Confusing the category for the procurement decision. “Knowledge graph platform” is a category. The procurement decision is at the layer (storage, ER, extraction, virtualization, governance metadata). A single-platform purchase under the assumption that the platform owns all seven layers is the most expensive procurement mistake; the platform usually owns two or three and integrates with the rest.
  3. Ignoring the M&A risk. The 2024 Ontotext-SWC merger, the 2026 SAP-Reltio acquisition, and the ongoing TigerGraph and Neo4j analytics-edge consolidations are the three you can name. The pattern is structural; expect the next one to land mid-procurement on a platform you are evaluating. The right posture is standards-anchored picks (RDF, openCypher, OpenLineage) and explicit M&A modeling on the three-year horizon.
  4. Underbudgeting entity resolution. The most-skipped layer is the layer most KG failures originate at. A storage purchase without a parallel ER procurement is a fancy join across unresolved records; the failures surface as duplicate counterparties, beneficial-ownership chains that almost-but-not-quite link, and AML investigations that miss connections that the storage layer technically held.
  5. Buying the LLM extraction layer before the ontology is locked. Track 2 extraction at scale produces triple explosion when the ontology is unstable. The Microsoft GraphRAG cost reckoning is in part a symptom of teams running extraction against an evolving schema, then re-running it after every ontology change. Fix the ontology to the under-100-class ceiling per Part 4 before procuring the extraction stack.
  6. Treating the governance metadata store as a downstream sink. A 2026 production KG has the governance metadata store on the read path for compliance queries and on the write path for the OpenLineage bridge. Treating Marquez or Atlan or DataHub as a tail-end “where do we put the metadata after we are done” question produces a substrate that cannot answer the BCBS 239 Principle 3 attribute-level question from Part 11b without a three-week archeology pass.

Eight-Question Vendor-Selection Diagnostic

The diagnostic below is the one Lakeside ran on every vendor in the second-round RFP. A vendor that answers yes on six or more is in serious consideration; a vendor that answers yes on all eight is in Lakeside posture.

Diagnostic questionYes if…No means…
Does the platform speak a W3C or de facto standard natively (RDF, openCypher, GQL ISO/IEC 39075, OpenLineage, R2RML, SHACL)?Standards are first-class; the vendor’s documentation references the spec versions explicitlyProprietary lock-in; switching cost is a multi-year project; M&A risk lands directly on the integration
Does the platform read your existing OpenLineage emission, or does it expect to be the lineage authority?The platform consumes OL events from your Airflow, Spark, dbt, Flink, Dagster, and joins them to its own metadataThe platform replaces what is already working; the OL bridge becomes a custom build; governance graph integration is non-trivial
Is entity resolution exposed as a real-time API at sub-200ms latency, or only as batch?The platform exposes ER as a streaming API; the SDK has self-correction; the tier-aware agent retrieval pattern is supportedAgent-layer use cases are out of scope; ER will be a separate procurement
Is the SHACL gate (or property-graph equivalent) configurable to your shapes, not the vendor’s?The platform loads your SHACL ShapeGraph; the gate runs at write time; quarantine-on-failure is the defaultThe platform’s pre-built shape catalog is the only option; your governance discipline does not transfer
Can the platform reproduce a 2025 query against a 2025 ontology version after the 2026 release lands?Named-graph version chains are first-class; consumer pinning by Version IRI is supported; alias-based rollback worksReproducibility breaks at every quarterly ontology release; the BCBS 239 examiner question becomes an archeology pass
What is the dollar-per-billion-triple cost trajectory over a three-year horizon?The vendor publishes per-tier pricing; the trajectory is visible; volume discount terms are explicitCost surprises at year two; the procurement budget runs out before the agent layer ships
What is the M&A risk on the platform and on the surrounding ecosystem (ER vendor, extraction stack, governance metadata store)?The vendor’s strategic position is stable; the ecosystem is not in active consolidation; the procurement runbook revisits at every quarterly architecture reviewThe vendor is the next acquisition target; mid-procurement consolidation forces re-evaluation
Does the vendor’s roadmap converge with your three-year stack target, or diverge?The roadmap commitments align with the layer map; the vendor’s positioning is stableThe platform is moving away from your stack target; the procurement decision has a built-in re-platforming risk

A firm that runs this diagnostic on every shortlisted vendor before the demo conclusion has the procurement posture that the Lakeside CDO arrived at after killing the first RFP. A firm that runs it after the demo conclusion is buying a vendor’s framing.

Do Next: Vendor Selection And Layer Map

The Do Next table closes the appendix. Each row maps to a procurement action; the priority is keyed to whether your firm is pre-RFP, mid-RFP, or post-vendor-selection.

PriorityActionWhy It Matters
Now (this quarter)Run the seven-layer stack-layer audit on your current KG (or your KG plan). For each layer, identify the standards anchor, the candidate vendors, the integration shape, and the M&A risk.The category is “KG platform”; the procurement decision is at the layer. Procurement runbooks that skip this step land at the original Lakeside RFP outcome
Now (this quarter)Run the triple-vs-property-vs-hybrid decision tree before any vendor demo. Pin the answer in writing; revisit only if the regulatory or query pattern changes.The storage paradigm decision belongs at the start of the procurement, not at the demo conclusion
Now (this quarter)If you are running Reltio, model the SAP acquisition roadmap risk in your procurement runbook. Identify the trigger conditions for re-evaluating the ER layer.Mid-procurement consolidation reshapes assumptions; the right posture is to model the risk now, not after the deal closes
Next (next two quarters)Issue layer-level RFPs (storage, ER, extraction, virtualization, governance metadata) rather than a single platform RFP. Run each as a separate evaluation against your ontology, your OpenLineage emission, and your reading patterns.The first-round Lakeside RFP would have spent $2.4M on a fancy browser; the second-round layer-level RFPs produced the stack the trilogy describes
Next (next two quarters)Run the eight-question vendor diagnostic on every shortlisted vendor. A vendor that answers yes on fewer than six is not in serious consideration; a vendor that answers yes on all eight is in Lakeside posture.The diagnostic separates a vendor pitch from a procurement decision
Soon (next year)Stand up the governance metadata bridge (Marquez or DataHub or Atlan as the OL consumer; PROV-O bridge into the operational graph) before procuring the agent-layer extraction stack.Track 2 without a governance bridge produces unprovenanced facts at scale; the Apex Capital incident from Part 9, an illustrative composite, replays
Soon (next year)Establish a quarterly architecture review that revisits the layer-level picks against the latest M&A and standards activity. Maintain an explicit M&A watch list.The 2026 consolidation pattern is structural; expect at least one consequential event per quarter; the right posture is scheduled review, not reactive scramble

Coming Next: Appendix B

Appendix A names the layers, the vendors, and the picks. Appendix B decomposes each layer into the dollar lines that the Part 11c cost-and-benefit roll-up summarized: infrastructure, license, headcount, and ramp components per layer; the build-versus-buy framework; team composition for foundation versus operational versus governance versus agent layers; and the dollar trajectory over a three-year horizon for a mid-size bank. Appendix C closes the practical-program guidance with the politics question: handling pushback, building sponsorship, and answering the “just use a database” argument that every KG program faces in year one. Part 12 wraps the series with what building a knowledge graph actually teaches you. Read this appendix as the layer map; read Appendix B as the cost map; read Appendix C as the political map; the Lakeside trilogy remains the worked example all three refer back to.

Sources & References

  1. Stardog Voicebox: LLM plus Knowledge Graph for Enterprise Data Conversations(2024)
  2. Neo4j AuraDB and Pricing(2026)
  3. Neo4j AuraDB Pricing Tracker (Modern Data Tools)(2026)
  4. Neo4j to Acquire GraphAware (intelligence-analysis platform)(2026)
  5. TigerGraph Secures Strategic Investment (Cuadrilla Capital take-private)(2025)
  6. AWS Neptune Analytics: Vector Indexing in Neptune Analytics(2026)
  7. AWS Database Blog: Triple your knowledge graph speed with RDF linked data and openCypher using Amazon Neptune Analytics(2026)
  8. Senzing v4 SDK and Real-Time Entity Resolution(2026)
  9. Senzing Pricing (per Data Source Record, annual tiers)(2026)
  10. Senzing Launches Kiro Power for Agentic Entity Resolution(2026)
  11. SAP to Acquire Reltio: AI-Ready Master Data Management(2026)
  12. Reltio Blog: A New Chapter for Reltio(2026)
  13. Ontop 5.5.0 Documentation: Virtual Knowledge Graphs over R2RML and OBDA(2026)
  14. AllegroGraph: Graph plus Vector plus Document for Enterprise KG(2026)
  15. Edge et al. (Microsoft Research): From Local to Global, A Graph RAG Approach(2024)
  16. Microsoft GraphRAG Documentation(2025)
  17. LightRAG: Graph Reasoning at Roughly 1/100th the Token Cost (RagdollAI analysis)(2026)
  18. Zep / Graphiti: Real-Time Knowledge Graphs for AI Agents(2025)
  19. LongMemEval Benchmark (canonical repository)(2025)
  20. Atlan: Zep vs Mem0 LongMemEval Comparison(2026)
  21. Marquez Project: Open Source OpenLineage Reference Implementation(2026)
  22. Atlan: What Is a Metadata Knowledge Graph(2026)
  23. Graphwise (Ontotext plus Semantic Web Company merger announcement)(2024)

Stay in the loop

Get new articles on data governance, AI, and engineering delivered to your inbox.

No spam. Unsubscribe anytime.