Building a Production Knowledge Graph at Lakeside Trust Bank: Foundation and the Operational Layer
The capstone of the Knowledge Graph Practitioner's Guide. A mid-size US bank takes the foundations from Parts 3 to 8 and turns them into a working production graph. This first piece covers the Monday-morning question that no spreadsheet-and-five-systems architecture can answer in time, the deliberate-versus-accidental KG choice, the modular FIBO-anchored ontology Lakeside imports, the 8-stage pipeline that flows Track 1 (R2RML on the warehouse) and Track 2 (LLM-extracted credit memos) into one graph, and the operational use case (customer 360, beneficial ownership, real-time transaction risk). Part 11a of the Knowledge Graph Practitioner's Guide.
Knowledge Graph Practitioner’s Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10 | Part 11a | Part 11b | Part 11c | Appendix A | Appendix B | Appendix C | Part 12
The Monday Morning Question That Took Three Weeks
A relationship banker at Lakeside Trust Bank, the mid-size US bank introduced in Part 4 and seeded across the foundations articles, opened her week in March 2026 with a routine question. One of her commercial clients was a mid-cap industrial firm in the Midwest. The firm’s controlling family, whom we will call the Müller family for the rest of this article, owned the operating company directly and had additional positions through a private wealth account, an offshore investment vehicle, two trusts (one revocable, one irrevocable), and a holding company that had recently bought a minority interest in a competitor. Her question was simple: “What is our total exposure to the Müller family across the bank, by line of business, and which positions have changed in the last quarter?”
She asked the question of the customer 360 dashboard first. The dashboard returned the operating company’s commercial loan portfolio. It did not know about the private wealth account, because private wealth’s master data was on a separate system. It did not know about the offshore vehicle, because beneficial ownership was tracked in a third place (the Treasury Operations spreadsheet that the AML team had been maintaining since 2019). It did not know about the trusts, because the trust accounting system fed into the data warehouse only weekly, and the latest weekly load had not yet reconciled the new irrevocable trust the family established in February. It did not know about the minority stake in the competitor, because the holding-company structure had been documented in a credit memo PDF on SharePoint and had not yet been entered into any system of record. The dashboard returned a confident number for the operating company. It said nothing about the rest.
The relationship banker called her counterparts at private wealth, at AML, and at trust operations. Each of them produced a partial answer from the system they owned. None of them produced a number that reconciled to anyone else’s number. The trust system used a different identifier for the family than the AML spreadsheet. The private wealth system used a third identifier. The credit memo on SharePoint named the family but did not link to any structured identifier at all. By Friday, the relationship banker had a forty-tab Excel workbook, a guess that the total exposure was somewhere between $42M and $58M depending on how the offshore vehicle was treated, and an uneasy feeling that the irrevocable trust was missing from the calculation entirely. The first follow-up call with the client was on Monday. She rescheduled it.
The full reconciliation took three weeks. The final number came in at $51.3M with the irrevocable trust included. The relationship banker had spent more time on the reconciliation than she had spent on the underlying client conversation. The pattern was familiar from the Northwind Bancshares anecdote in Part 10, but this was an operational question, not a regulatory one. The regulatory version is “the examiner asked a question that took three weeks.” The operational version is “the relationship banker asked a question that took three weeks.” Both fail the same way and for the same reason. The bank had five systems that each owned a partial view of the same family, no shared identifier across them, and no single graph that connected them. The accidental architecture had reached its operational ceiling.
Lakeside’s CDO had been watching this pattern accrete for two years. The bank had purchased an MDM hub, a catalog, a lineage tool, and a CDE inventory in roughly that order, each well-justified at the time. A small generative-AI pilot for relationship bankers had been added in late 2025. None of the five investments shared identifiers with any of the others. The Müller family incident was the moment when the deliberate-versus-accidental choice became urgent. This article is the story of what Lakeside built next, beginning with the foundation and the operational layer. The governance layer is in Part 11b. The agent layer is in Part 11c.
Why The Deliberate Path Was Cheaper Than Continuing The Accidental One
Lakeside’s CDO took a one-page memo to the executive committee. The bank had committed roughly $7M across MDM, catalog, lineage, CDE inventory, and an AI pilot over two years. Each project had delivered against its narrow scope; none of them, alone or together, could have answered the Müller question on Monday. A conservative internal estimate put the recurring annual cost of the accidental architecture at $9M to $12M of senior banker time spent reconciling identifiers across systems, before counting the regulatory ROI from BCBS 239 and EU AI Act conformance covered in the Part 10 cross-walk.
The deliberate path was a knowledge graph that became the substrate every existing investment wrote into and read from. The MDM hub stays. The catalog stays. The lineage tool stays. The CDE inventory stays. The AI pilot stays. What changes is that all five stop minting their own canonical identifiers and start writing into and reading from a shared graph using IRIs that one identity discipline mints (the Part 5 IRI rule, applied across the bank). The shared graph is not a sixth system. It is the substrate the other five accidentally needed and never built.
The argument that won the committee was not that the graph would be fast. It would not. The argument was that every additional non-graph investment (the next AI pilot, the next regulatory lineage workstream, the next MDM domain expansion) would widen the gap rather than close it, and reconciling four to six fragmented stores after another year of metadata growth would cost strictly more than building the substrate now and routing future investments through it. The accidental path had a compounding interest rate the bank could no longer afford.
The committee approved a two-year program with a hard scoping rule: the graph carries three use cases at launch (the operational case in this article, the governance case in Part 11b, the agent case in Part 11c) and no others. New use cases enter only after the launch three are in production. The scoping rule is the answer to the Part 2 boil-the-ocean-ontology failure mode.
The agent-era restatement: the deliberate-versus-accidental choice is an economic one before it is a technical one. The accidental architecture compounds: every new non-graph investment mints another canonical identifier and widens the reconciliation gap, so the cost of unifying four to six fragmented stores after the fact grows faster than the cost of building one shared substrate now. The deliberate path wins not because the graph is fast (it is not) but because routing future investments through one identity discipline, one ontology, and one provenance contract turns a compounding liability into a fixed cost.
Lakeside’s Profile, As An Architecture Forcing Function
Lakeside Trust Bank is a $75B-asset US bank, regional commercial plus retail plus private wealth, headquartered in Chicago with branches across the Upper Midwest and a small EU subsidiary supporting US clients with European subsidiaries. The bank has roughly 10,000 employees, 1.2M retail customers, 22,000 commercial counterparties, and three regulatory lenses: BCBS 239 from the Federal Reserve, GDPR from the EU subsidiary, and the EU AI Act for the relationship-banker agent, whose high-risk obligations Lakeside is preparing for on the timeline that, as of mid-2026, the EU’s provisional May 2026 Digital Omnibus agreement would defer to 2 December 2027 for stand-alone Annex III systems (pending formal adoption). Pre-KG, Lakeside’s data stack was Snowflake plus an evolving Iceberg-on-S3 lakehouse, dbt, Apache Spark, OpenLineage emitting from both pipeline engines, and a retail master-data (MDM) platform for retail customers (see Appendix A for the specific tools; commercial customers were master-mismanaged in Excel until 2024). The CDE inventory had ~280 elements following the CDE meta-model from the existing series at the standard 1:20 ratio (~5,600 implementing fields).
| Lakeside dimension | Value | Implication for the KG architecture |
|---|---|---|
| Total assets | $75B | Mid-size; not Top-25 US bank scale; can build with a small dedicated team rather than a federation |
| Retail customers | 1.2M | Real-time ER required on the serving path (Part 5 lock); cannot be nightly batch |
| Commercial counterparties | 22,000 | Beneficial ownership traversal at this scale is a graph problem; not a join problem |
| Lines of business | Commercial + Retail + Private Wealth | Three customer 360 contexts that must reconcile to one identity |
| Geographic footprint | US + EU subsidiary | GDPR scope; the customer IRI scheme must respect EU data residency |
| Pipeline engines | Spark + dbt | Both emit OpenLineage natively; Track 1 ingest is mostly already instrumented |
| Unstructured corpus | Credit memos, KYC files, advisor notes (~250GB) | Track 2 LLM extraction is required; pure structured ingestion will not capture the Müller-style questions |
| Existing MDM hub | retail master-data (MDM) platform, retail-only | The hub stays; commercial MDM moves into the KG; retail records in MDM federate via owl:sameAs |
| Existing OpenLineage | Spark + dbt instrumented | Bridge to PROV-O is the lift, not the instrumentation |
| Existing CDE inventory | ~280 CDEs, ~5,600 fields at 1:20 | Each CDE becomes a typed node with cde:hasImplementation edges (Part 10 pattern) |
Every architectural choice in the rest of this article is forced by some row in this table. Real-time ER is forced by 1.2M retail customers and a serving path under 200ms. Track 2 LLM extraction is forced by the 250GB of unstructured corpus the Müller incident proved is load-bearing. The thin in-house ontology module is forced by Lakeside’s commercial-banking concepts that FIBO does not cover at sufficient resolution. Profiles drive architectures.
The Ontology Lakeside Imports
The first technical decision was the ontology. The temptation, hard to resist for any organization with strong domain experts, is to write a custom ontology that captures the bank’s specific way of thinking. Lakeside resisted, following the Part 4 modular pattern: import as much as possible from established vocabularies, and write as little as possible in-house. The result is nine imported modules and one thin in-house module.
| Module | Source | What it provides | Why Lakeside imports it |
|---|---|---|---|
| FIBO BE | EDM Council (2026 Q1 production release: FIBO as a whole ships roughly 2,446 classes across 194 ontology files; Business Entities is one of seven domains) | LegalEntity, ControllingInterest, BeneficialOwner, BusinessRelationship | The vocabulary for beneficial ownership traversal; FinCEN BOI compliance; cross-bank interoperability |
| FIBO LOAN | EDM Council | LoanContract, CreditExposure, Collateral, Guarantor, Covenant | Commercial loan portfolio modeling; credit-exposure aggregation; the operating-company exposure in the Müller question |
| FIBO SEC | EDM Council | SecurityHolding, Issuer, Underwriter, MarketIdentifier | Private wealth holdings; trading book exposure; the family’s offshore-vehicle holdings |
| FIBO FBC | EDM Council | Deposit, DepositAccount, FinancialInstitution, hasDeposit | Retail and commercial deposit relationships; the deposit leg of the Müller exposure aggregation |
| W3C Time | W3C Recommendation 2022 | Instant, Interval, before, after, during | Bitemporal modeling per Part 8; valid time and transaction time on every edge that needs it |
| W3C PROV-O | W3C Recommendation 2013 | Activity, Entity, Agent, wasGeneratedBy, wasDerivedFrom, wasAttributedTo | The seven-field provenance contract from Part 7 and Part 8; OpenLineage bridge ground truth |
| W3C SKOS | W3C Recommendation 2009 | Concept, prefLabel, broader, narrower, exactMatch | The business glossary; cross-mappings between FIBO terms and Lakeside-internal terms |
| W3C DCAT 3 | W3C Recommendation 2024 | Catalog, Dataset, Distribution, DataService | The catalog vocabulary; dataset descriptions for the lineage layer in Part 11b |
| W3C SHACL | W3C Recommendation 2017 | NodeShape, PropertyShape, sh:targetClass, sh:minCount | The validation layer; SHACL gates per Part 6 and Part 7 |
lksb: (in-house) | Lakeside-defined, ~80 classes | InternalProductHierarchy, RelationshipBankerAssignment, InternalRiskTier, AdvisorRecord | Internal concepts FIBO does not cover at sufficient resolution; deliberately small |
The thin lksb: module is the discipline that prevents the boil-the-ocean failure. Eighty classes total. Every class either subclasses a FIBO class (so FIBO consumers can read Lakeside data without learning the in-house vocabulary) or models a strictly internal concept (a relationship-banker assignment, an internal risk tier that does not map cleanly to a regulatory tier). When a Lakeside ontologist proposes a new in-house class, the first review question is: does this exist in FIBO already? If yes, import it. If no, can FIBO be extended? Only if neither is true does a class enter lksb:. The discipline kept the in-house module under 100 classes after eighteen months. In the composite failure pattern this series draws on, a bank without the discipline lets the in-house module sprawl to over a thousand classes in roughly the same period and then spends a quarter consolidating them; the figure is illustrative of the boil-the-ocean trajectory, not a measurement of a named institution.
The 8-Stage Pipeline At Lakeside
The construction architecture follows the 8-stage pipeline (Ingest, Map, Resolve, Mint, Assert, Reason, Validate, Serve) introduced in Part 5, elaborated for source construction in Part 6, and used as the spine by the rest of the series. At Lakeside, the pipeline runs both Track 1 (structured warehouse via R2RML) and Track 2 (unstructured documents via LLM extraction with three discipline points), converging at the resolve and validate stages, fanning out to the serve stage in three directions: SPARQL endpoint for operational queries, named-graph snapshots for governance reporting, and the agent retrieval surface for the relationship-banker agent (Part 11c).
| Stage | Track 1 source (structured) | Track 2 source (unstructured) | Lakeside specific |
|---|---|---|---|
| 1. Ingest | Snowflake tables (commercial loan master, retail customer master, trust accounting), Iceberg tables (transaction stream), via OpenLineage-emitting Spark and dbt jobs | SharePoint credit memos, KYC PDFs, advisor notes, AML investigation files (~250GB) | Track 1 fully instrumented; Track 2 ingestion built fresh in Q1 2026 |
| 2. Map | R2RML mappings from relational rows to RDF triples (Part 6 pattern; lk: and lkv: prefixes already locked) | LLM extraction with fixed FIBO BE plus FIBO LOAN ontology in the prompt; iText2KG-style schema-bound extraction | Track 2 prompt engineering is the largest single quality lever; three discipline points enforced (fixed ontology, dedup before assert, SHACL gate) |
| 3. Resolve | real-time entity resolution on the serving path via the entity-resolution engine (Part 5 lock); deterministic blocking on tax ID plus probabilistic on name plus address | Same ER pipeline; extracted entities resolved against the same identity index as Track 1 entities | Real-time ER serves both tracks against the same identity; not two separate ER systems |
| 4. Mint | Stable IRIs under https://lakeside.com/kg/{type}/{stable-id} per Part 5 IRI discipline | Same IRI scheme; minted only after resolve completes | One mint authority for the whole bank; no system mints its own canonical IRIs after migration |
| 5. Assert | Triples written into named graphs partitioned by source system and release window (Part 8 versioning) | Triples written into a designated extraction named graph with reified provenance per fact (the seven-field contract introduced in Part 7 and completed in Part 8, including validatedAgainstShapes) | Named graph naming convention: https://lakeside.com/graph/{source}/{release} |
| 6. Reason | Materialized OWL RL inference for transitive controlsInterestIn, subClassOf propagation, and inverse-of pairs (Part 5 OWL profile selection) | Same reasoner; extracted facts participate in inference once they pass the SHACL gate | OWL 2 RL profile chosen for predictable polynomial-time inference; not OWL DL |
| 7. Validate | SHACL shapes applied at write time per Part 7 quality framing; ten core shapes plus per-product shapes | Same shapes; an extracted triple that fails SHACL is quarantined into a bronze named graph for human review | A failing extraction is never silently dropped; quarantine is the discipline |
| 8. Serve | SPARQL endpoint on the canonical RDF triple store plus a property-graph view on the property-graph traversal store for traversal-heavy queries (the hybrid paradigm decision from Part 3); virtualized views over Snowflake hot data; materialized views for heavy aggregations | Same endpoint; the agent retrieval surface in Part 11c reads from the same graph | Hybrid materialization plus virtualization is the realistic enterprise architecture; not pure materialization |
The two-track convergence is what makes the Müller question answerable. Track 1 carries the operating company’s commercial loans and the private wealth holdings (both in the warehouse). Track 2 carries the credit memo describing the holding company’s minority stake (on SharePoint) and the AML file describing the offshore vehicle’s beneficial owners (in PDFs). At stage 3 (resolve), every entity from both tracks resolves against the same identity index. The Müller family node ends up with edges from every source that mentions them, whether the source was a Snowflake table or a SharePoint PDF. At stage 7 (validate), every fact carries the seven-field provenance contract from Parts 7 and 8: where, what process, who, when, trust level, source hash, validatedAgainstShapes. At stage 8 (serve), the operational query “total exposure to the Müller family” traverses one graph rather than five.
What this looks like in practice. The discipline that distinguishes Lakeside’s pipeline from the average enterprise KG attempt is not the choice of triple store or the choice of LLM. It is the convergence at stage 3 (resolve). Most failed KG projects run Track 1 and Track 2 as separate systems that produce separate graphs; the Müller question is unanswerable in that architecture because Track 1 has commercial loans without the offshore vehicle and Track 2 has the offshore vehicle without the loans. If your pipeline architecture diagram shows two parallel graphs, the architecture has not yet earned its name.
Pipeline cost at Lakeside scale is dominated by Track 2 LLM extraction (roughly 3M document pages per quarter at a per-page extraction cost that, as of 2026, runs roughly $0.01 to $0.05 per page for schema-bound extraction on a mid-tier model and is modeled in Appendix B) and the hybrid serving tier that pairs the canonical RDF triple store with the property-graph traversal store (the dual paradigm from Part 3 is paid for by the operational latency targets and SPARQL semantics for governance). Appendix B covers the cost modeling in detail.
The Operational Use Case: Customer 360, Beneficial Ownership, Real-Time Transaction Risk
The operational use case at Lakeside is a single graph that answers three classes of question that the bank’s relationship bankers, AML investigators, credit officers, and treasury staff ask every day. The questions look different from the outside (a customer 360 dashboard query, an AML beneficial-ownership investigation, a real-time transaction risk score) but they reduce to the same shape on the inside: a typed graph traversal from a starting entity through one to four hops of typed relationships, scoped by time and trust tier, with quality and provenance metadata returned alongside the answer.
Customer 360: The Müller Family Reconciled
The Müller-family reconciliation that took three weeks pre-KG becomes a single SPARQL query post-KG. The query pattern starts from the family identifier (a fibo-be:LegalEntity with lksb:familyGroup annotation), traverses through fibo-be:BeneficialOwnership and fibo-be:ControllingInterest to every legal entity the family controls, then through fibo-loan:hasCreditExposure and fibo-sec:hasSecurityHolding to every position those entities have at the bank, aggregates the exposure by line of business, and returns the total with per-position provenance.
The pattern is short and worth showing.
SELECT ?lob (SUM(?amount) AS ?totalExposure)
WHERE {
?family lksb:familyGroup "Müller" .
?family (fibo-be:hasBeneficialOwnership|
fibo-be:hasControllingInterest|
fibo-be:hasIndirectControl)+ ?entity .
?entity (fibo-loan:hasCreditExposure|
fibo-sec:hasSecurityHolding|
fibo-fbc:hasDeposit) ?position .
?position lksb:lineOfBusiness ?lob ;
lksb:exposureAmount ?amount ;
prov:wasGeneratedBy ?activity .
?activity prov:endedAtTime ?ts .
FILTER (?ts >= "2026-01-01"^^xsd:dateTime)
}
GROUP BY ?lob
The query returns five rows: commercial loans ($28.4M), private wealth holdings ($14.7M), trading book ($3.2M), trust assets ($4.1M), and offshore vehicle exposures ($0.9M), summing to $51.3M. The same number that took three weeks to reconcile by hand is computed in 180ms against the operational SPARQL endpoint.
The four-part lens from Part 1 decomposes the query.
| Lens | What the query depends on | Where it came from in the architecture |
|---|---|---|
| Entities | fibo-be:LegalEntity for the family and each controlled entity; fibo-loan:CreditExposure and fibo-sec:SecurityHolding for each position | FIBO BE plus FIBO LOAN plus FIBO SEC import (Part 4); the in-house lksb:familyGroup annotation tags a family across legal entities |
| Typed relationships | fibo-be:hasBeneficialOwnership, fibo-be:hasControllingInterest, fibo-be:hasIndirectControl, fibo-loan:hasCreditExposure, fibo-sec:hasSecurityHolding | FIBO BE relationship vocabulary; the predicate property path (+) gives transitive traversal across the ownership chain |
| Identity | One IRI per family, one IRI per legal entity, one IRI per position; no two IRIs for the same real-world thing across the bank’s systems | Stage 3 resolve plus stage 4 mint of the pipeline; the IRI discipline from Part 5 |
| Inference | Indirect control (the family controls the holding company, which controls the operating company, so the family transitively controls the operating company) materialized at stage 6 reason | OWL 2 RL inference rules; transitive property assertions on fibo-be:hasIndirectControl |
The Müller graph is the smallest possible illustration of the operational use case. The same pattern extends to every commercial counterparty at Lakeside (22,000) and every retail customer (1.2M). At the 99th percentile, a customer 360 query at Lakeside touches under 200 nodes and under 500 edges. SPARQL meets the latency budget at the 95th percentile by itself; at the long tail, the property-graph traversal store takes over for the traversal-heavy edge cases (the bank’s largest commercial counterparty group has roughly 1,400 controlled entities and 6,200 positions).
Beneficial Ownership Beyond The Müller Family
Beneficial ownership at scale is the second operational pattern. The FinCEN Beneficial Ownership Information Reporting Rule under the Corporate Transparency Act requires that US-formed legal entities report their beneficial owners (any individual with substantial control or 25% or greater ownership). Lakeside’s commercial onboarding workflow verifies reported BO against what the bank can independently observe; AML investigators traverse BO chains during sanctions screening, suspicious activity reviews, and PEP checks. Pre-KG, this work was done by hand against the same fragmented systems that defeated the Müller question. Post-KG, the same graph carries the BO traversal as a natural query.
The graph pattern is fibo-be:LegalEntity nodes connected by fibo-be:hasControllingInterest and fibo-be:hasBeneficialOwnership edges, with fibo-be:percentageOwnership and fibo-be:asOfDate properties on each edge (the bitemporal annotation from Part 8). A BO query starts from a legal entity and traverses up to natural-person owners with cumulative ownership above a threshold. SPARQL’s property paths and OWL 2 RL transitive inference make this a one-screen query. Without a graph, the same question is a recursive SQL CTE that nobody writes correctly the first time and that nobody can audit when it returns the wrong answer.
The 25-percent threshold maps to a FILTER clause; the substantial-control threshold (broader, softer) maps to a different traversal that picks up directors, officers, and voting-agreement holders. Both checks reuse the same identity discipline (one IRI per natural person, one IRI per legal entity) and return provenance for every contributing edge.
Real-Time Transaction Risk On The Serving Path
The third operational pattern is real-time transaction risk on the serving path. Every wire transfer, ACH transaction, and trade booking at Lakeside passes through a risk service with under 200ms to return a score before the transaction proceeds. The service must consider the counterparties (sender and receiver), their beneficial ownership chains, any sanctions or PEP designations on any controlling party, the historical transaction pattern between the parties, and any open AML investigations. Pre-KG, the service hit four to seven APIs and joined the results in service code at 250ms to 800ms latency, intermittently blowing the budget under load. Post-KG, the service issues one parameterized SPARQL query against the operational graph and gets a structured answer back in under 100ms.
The query traverses each counterparty’s ownership chain to a configurable depth (typically four hops), checks each node along the way against lksb:sanctionsList and lksb:pepList edges, checks the historical transaction pattern, and returns a score plus contributing factors. The latency budget is met because the inference is already materialized at stage 6: when the service queries ?counterparty fibo-be:hasIndirectControl ?owner, the answer is precomputed at write time, not at query time. The hybrid serving from Part 3 covers the long tail: deep traversals on the property-graph traversal store, SPARQL semantics for cross-graph reasoning and trust-tier filtering on the canonical RDF triple store. The risk service is the operational consumer where the dual paradigm pays for itself most clearly.
Diagnostic: Where Is Your Bank On The Foundational Layers
The diagnostic for whether your firm is ready for the operational layer of a knowledge graph is not “do we have the technology.” The technology is available, mature, and affordable. The diagnostic is whether the foundational layers are in place. The eight rows below are the questions Lakeside’s CDO asked the executive committee before approving the program. If your firm has fewer than five rows answered “yes,” the operational use case will not pay back the investment in twelve months, and the deliberate-versus-accidental argument has to be made on a longer time horizon.
| Diagnostic question | Yes if… | No means… |
|---|---|---|
| Is there one IRI per real-world entity across the firm? | Every system writes and reads a single canonical identifier for each customer, counterparty, product | Identity reconciliation is the primary cost driver; fix this before the graph |
| Is there one ontology (mostly imported, thin in-house) for the bank’s concepts? | FIBO or industry equivalent imports cover 80%+ of concepts; in-house module is small and disciplined | Ontology fragmentation will eat the program; start with FIBO if you are a bank |
| Is OpenLineage emitting from the pipeline engines? | Spark, dbt, Airflow, Flink emit OpenLineage natively today | The PROV-O bridge depends on lineage events; instrument first |
| Is real-time ER feasible on the serving path? | The bank can resolve a counterparty against the identity index in under 50ms | Operational use cases will not meet latency; nightly batch ER pushes you to governance-only KG |
| Is unstructured corpus material to the operational questions? | Credit memos, KYC files, advisor notes carry information that structured systems do not | Track 2 LLM extraction is required; pure Track 1 will not answer Müller-style questions |
| Are CDEs identified at business-concept granularity (~200-400)? | The bank has a CDE inventory at the right grain following the meta-model | Without CDEs, the governance layer (Part 11b) cannot anchor; build the inventory first |
| Is there an executive sponsor with a two-year horizon? | A CDO, CRO, or COO has approved a two-year program with a hard scoping rule | Without sponsorship, the boil-the-ocean failure mode (Part 2) is the default outcome |
| Is the in-house ontology module under 100 classes? | The bank has the discipline to import most concepts and write very few | Ontology bloat is the leading indicator of program failure; cap it before you start |
Lakeside answered yes on all eight questions before launching. That is the precondition the article assumes. Firms that answer yes on five or six can still proceed but should expect a longer payback period and should sequence the foundational gaps before scaling the graph.
What Comes Next: Part 11b And Part 11c
Part 11a established the foundation: the Monday-morning question, the deliberate-versus-accidental choice, the modular ontology Lakeside imports, the 8-stage pipeline that converges Track 1 and Track 2 at one identity, and the operational use case (customer 360, beneficial ownership, real-time transaction risk). The same graph carries two more use cases.
Part 11b shows how the same graph answers regulators. OpenLineage events from Lakeside’s pipelines feed in as PROV-O activities (Part 7, Part 10); the ~280 CDEs become typed nodes with cde:hasImplementation edges to ~5,600 fields (CDE meta-model); one SPARQL endpoint answers BCBS 239 Principle 3, ECB RDARR attribute-level lineage, GDPR Article 30 ROPA, and EU AI Act Article 10 training-data provenance.
Part 11c shows how the relationship-banker agent uses the graph. The CoALA four-layer memory model from Part 9 maps to named graphs (semantic plus episodic memory) plus the agent’s working context plus a skill subgraph (procedural memory). The trust-tier-aware retrieval pattern enforces three policies in production: portfolio decisions strict-tier-floor (gold only), client-meeting prep tier-segregated, advisor-facing summaries tier-explicit-citation. The same trust-tiered substrate answers both the agent and the regulator.
Part 11c closes with what Lakeside got wrong on the way, the contract and change-management discipline that keeps the graph operable across quarterly FIBO releases, a cost-modeling preview, and a Do Next table that spans all three pieces.
Do Next
The actions below are scoped to the foundation and operational layer this article covers. They sequence in tiers: prove the foundation is in place before you build the pipeline, and run the pipeline before you wire the operational consumers.
| Priority | Action | Why it matters |
|---|---|---|
| Now (foundation) | Run the eight-row diagnostic against your firm; count the “yes” answers honestly | Fewer than five “yes” answers means the operational use case will not pay back in twelve months; sequence the gaps first |
| Now (foundation) | Pick one industry-standard ontology (FIBO if you are a bank) and import it; cap the in-house module at under 100 classes with a review-board gate | Ontology fragmentation is the leading indicator of program failure; the import-first discipline is what keeps the in-house module from sprawling to over a thousand classes |
| Now (foundation) | Establish one IRI mint authority and one identity discipline across all existing systems (MDM, catalog, lineage, CDE inventory, AI pilot) | Without one IRI per real-world entity, every later investment widens the reconciliation gap rather than closing it |
| Next (pipeline) | Stand up the 8-stage pipeline introduced in Part 5 with both tracks converging at stage 3 (resolve), not as two parallel graphs | The two-track convergence is what makes Müller-style questions answerable; parallel graphs cannot answer them |
| Next (pipeline) | Enforce the three Track 2 discipline points: fixed ontology in the prompt, dedup and ER before assert, SHACL gate before write, with failures sent to quarantine rather than dropped | LLM extraction is the largest single quality lever; an ungated extraction path silently corrupts the graph |
| Later (operational) | Wire the operational consumers (customer 360, beneficial ownership, real-time transaction risk) to one SPARQL endpoint, with the property-graph view for the traversal-heavy long tail | The operational payoff is one query against one identity, one ontology, one provenance contract, instead of a three-week manual reconciliation |
| Later (operational) | Materialize transitive control inference at stage 6 (reason) so the real-time risk service meets its latency budget without query-time recursion | The serving-path latency budget is achievable only because the inference is precomputed at write time |
Sources & References
- FIBO: Financial Industry Business Ontology(2025)
- FIBO Business Entities (BE) Module(2025)
- FIBO Loans (LOAN) Module(2025)
- FIBO Securities (SEC) Module(2025)
- W3C R2RML: RDB to RDF Mapping Language(2012)
- W3C PROV-O: The PROV Ontology(2013)
- W3C SHACL: Shapes Constraint Language(2017)
- W3C Time Ontology in OWL(2022)
- OpenLineage: An Open Standard for lineage metadata collection(2024)
- FinCEN Beneficial Ownership Information Reporting Rule (Corporate Transparency Act)(2024)
- Hogan et al.: Knowledge Graphs (ACM Computing Surveys 2021; Synthesis Lectures, Morgan and Claypool 2022)(2022)
- Real-Time Entity Resolution for Operational Use Cases(2025)
- EDM Council FIBO Releases (2026 Q1 Production Release)(2026)
- Gibson Dunn: EU AI Act Omnibus Agreement Postpones High-Risk Deadlines(2026)
- Parsli: The Real Cost of LLM OCR Document Extraction(2026)
Stay in the loop
Get new articles on data governance, AI, and engineering delivered to your inbox.
No spam. Unsubscribe anytime.