Industry Teardowns June 19, 2026 · 21 min read

LinkedIn's Economic Graph Teardown: A Knowledge Graph You've Used Today

A teardown of LinkedIn's Economic Graph as a knowledge graph: 1.2 billion members, 69 million companies, 41 thousand skills, all stitched together by typed relationships and machine-inferred connections. The first article in The Knowledge Graph Practitioner's Guide. We use a system you already know to introduce the foundational vocabulary you will need for the rest of the series.

By Vikas Pratap Singh
#knowledge-graph #linkedin #economic-graph #ontology #graph-database #data-architecture

Knowledge Graph Practitioner’s Guide: Overview | Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 6 | Part 7 | Part 8 | Part 9 | Part 10 | Part 11a | Part 11b | Part 11c | Appendix A | Appendix B | Appendix C | Part 12

I Opened LinkedIn This Morning

I had not logged in for a week. The first thing the homepage offered me was a connection suggestion: a former colleague from a project I worked on five years ago, two jobs ago, in a city I no longer live in. We had exchanged maybe three emails in our entire working relationship and never connected on LinkedIn. There she was at the top of “People you may know.”

How did the system know we worked together? I had never tagged her. She had never tagged me. The project was on neither of our profiles. The company we both worked at had thousands of employees; merely having that company in common would not have been enough.

The system knew because it had quietly assembled, from millions of small signals, a model of our working relationship: shared time at the same company, overlapping skills, similar career trajectory, mutual connections, comparable seniority. It did not just store facts about us. It inferred a fact neither of us had told it.

That inference is what makes LinkedIn’s Economic Graph a knowledge graph and not a database. And if you have ever clicked on a person in “People you may know,” accepted a skill suggestion, or seen a job recommendation that made you wonder if the algorithm could read your mind, you have used a knowledge graph today.

What struck me was not just that the inference was right. It was that I had no way to see why. I have spent a career telling clients that inference without provenance is a liability, and here was a graph quietly inferring facts about me, usefully, with none of the lineage I would demand at work. That tension, inference that is valuable but unexplained, runs through every knowledge graph in this series.

This article is the first in a 16-part guide to knowledge graphs from first principles. The rest of the series will not be specific to LinkedIn. We start here because almost every reader has used the Economic Graph, often daily, for years. By the end of this teardown you will recognize four moving parts that show up in every knowledge graph: entities, typed relationships, identity, and inference. That vocabulary will carry you through the next fifteen articles.

For practitioners: when a vendor pitches a “knowledge graph product,” the test is not whether they have a graph database under the hood. The test is whether all four moving parts are present and integrated. Knowledge graph projects fail far more often on identity and inference than on storage.

Scale

LinkedIn’s Economic Graph, as of April 2026, contains 1.2 billion members, 69 million companies, 41 thousand skills, and 140 thousand schools, spanning more than 200 countries. The same system in 2016, when LinkedIn first published its detailed knowledge graph paper, held 450 million members, 9 million companies, and 35 thousand skills.

Two-and-a-half-x growth in the largest entity type in a decade is a useful number to keep in mind for the rest of the series. Real knowledge graphs do not stay still. The schema, the entity counts, the relationship types, and the inference rules all change over time. We will return to this in Part 8 when we discuss operating a KG.

Entity type2016 (KG paper)2026 (Economic Graph)Multiple
Members450 million1.2 billion2.7x
Companies9 million69 million7.7x
Skills35 thousand41 thousand1.2x
Schools28 thousand140 thousand5.0x

Sources: Building the LinkedIn Knowledge Graph (2016) and economicgraph.linkedin.com (April 2026).

The skill count grew the least, which is itself an insight. Skills are an ontology that LinkedIn has invested heavily in cleaning, deduplicating, and stabilizing. We will see why this matters when we cover ontologies in Part 4.

The Entities

A knowledge graph represents real or abstract things in the world as entities. Every entity has a unique identity, a type, and a set of properties.

LinkedIn’s Economic Graph holds at least eleven distinct entity types, by their own published documentation:

Entity typeWhat it representsExamples
MemberA real person with a professional profileYou, me, your manager, the former colleague the system surfaced
CompanyA legal or operational organizationLinkedIn, your employer, a startup, a non-profit
JobA specific open or historical job listing”Staff Data Engineer at Acme, posted March 2026”
SkillA capability that members possess and jobs require”Python,” “Kubernetes,” “Data Governance,” “Crisis Management”
SchoolA higher-education institutionStanford, IIT Delhi, Northwestern
TitleA normalized job title”Senior Software Engineer,” “Head of Data”
IndustryAn economic sector”Banking,” “Higher Education,” “Renewable Energy”
LocationA geographic place”Greater Chicago Area,” “Bengaluru,” “Remote (United States)“
Field of studyAn academic discipline”Computer Science,” “Public Policy”
DegreeAn educational credential”Bachelor’s,” “MBA,” “PhD”
CertificateA professional credential”AWS Solutions Architect,” “PMP”

Source: Building the LinkedIn Knowledge Graph (He, Chen, Agarwal, 2016). The 2016 paper documents 24 thousand titles in 19 languages, 1.5 thousand fields of study, 600+ degrees, and 500+ certificates. Title Case is preserved within entity names.

Each entity has properties. A Member has a name, a headline, a current title, a network size. A Company has a name, an industry, a headcount band, a founding year. A Skill has a name and a one-line definition. The properties make an entity self-describing.

What this looks like in practice: a Member entity for the imaginary “Vikas Pratap Singh” might carry properties { name: "Vikas Pratap Singh", currentTitle: "Principal Data Architect", connectionCount: 5400, profileLastUpdated: "2026-04-15" }. Each property has a value type (string, integer, date) and may have a confidence score. We will return to confidence scores in the inference section below.

The first thing to notice is that LinkedIn’s entities are not tables. The set of properties on a Member is not fixed; new attributes can be added without a schema migration. The set of types is not fixed; new types (Certificates, for instance) were introduced years after the original member-job-skill triad. This is one reason LinkedIn calls their store a graph and not a database. We will define the precise difference in Part 3.

The Typed Relationships

What turns a collection of entities into a graph is the relationships between them. A relationship is a directed, typed edge from one entity to another. The type tells you what the relationship means.

The smallest possible knowledge graph is one entity, one relationship, and another entity. Three pieces. That triplet is the atomic unit of the entire 1.2 billion node Economic Graph. It looks like this:

A single fact in a knowledge graph: a Member entity (Alice), a typed relationship (worksAt), and a Company entity (Acme), shown in both RDF Turtle syntax and property graph (Cypher) syntax.

The two notations on the bottom of the diagram are the same fact in two different paradigms (RDF and property graph). We will compare them rigorously in Part 3. For now, notice that both paradigms agree on the structure: subject entity, typed relationship, object entity. That triplet is the atom.

LinkedIn’s Economic Graph contains hundreds of relationship types, most of them inferred. These are some of the foundational ones:

RelationshipSource entityTarget entityMeaning
worksAtMemberCompanyCurrent employment
workedAtMemberCompanyHistorical employment, with date range
hasSkillMemberSkillThe member claims or has been inferred to have this skill
attendedSchoolMemberSchoolEducation history
knowsMemberMemberFirst-degree connection
requiresSkillJobSkillThe job posting requires this skill
postedByJobCompanyThe company that listed this job
locatedInMember or Company or JobLocationGeographic anchor
inIndustryCompanyIndustryThe company’s industry
subClassOfSkillSkillSkill hierarchy (“Python” is a kind of “Programming Language”)
similarToSkillSkillInferred semantic similarity
relatedTitleTitleTitleCareer mobility (“Senior Engineer” leads to “Staff Engineer”)

Visually, even a tiny slice of the Economic Graph weaves these together quickly:

A small fragment of LinkedIn's Economic Graph showing five entity types (Member, Company, Skill, School, Job) and eight typed relationships (worksAt, workedAt, knows, hasSkill, attendedSchool, postedBy, requiresSkill).

The relationship is the unit of meaning in a knowledge graph. A row in a relational database tells you what an entity is. An edge in a knowledge graph tells you what an entity does, knows, owns, came from, will become.

Two pieces of LinkedIn’s relationship design are worth pausing on, because they are not obvious and they generalize.

First, relationships are typed and directional. Alice worksAt Acme is a different fact from Acme employs Alice, even though they describe the same situation. LinkedIn stores them differently: a Member node has outgoing worksAt edges; a Company node has incoming. This matters because traversal queries (find me all employees of Acme; find me all companies Alice has worked at) follow different paths.

Second, the schema is open-ended. According to LinkedIn’s engineering team, the graph “supports tens of terabytes of graph data and half a million QPS” with the property that “the schema (as a collection of edge labels) can be extended in constant time in the live graph.” Quoted from LinkedIn’s published architecture overview. Adding a new relationship type does not require taking the system offline. This is rare in relational databases and is one of the operational reasons graph stores are used for KGs.

What this looks like in practice: when LinkedIn introduced “Open to Work” in 2020, it was a new relationship type (“memberIs” -> “OpenToWork”), not a new column on the Member table. The new edge type appeared in the live graph and started accumulating instances immediately. No downtime, no migration.

Identity: Why a Member is a Member, Not a Row

The third moving part is identity. Every entity in the Economic Graph has a stable, globally unique identifier. For Members it is a numeric member ID. For Companies, a numeric company ID. For Skills, an internal skill URN.

The reason this matters is hard to grok until you encounter it being broken. In a relational schema, a “row” exists inside a table; if you delete the table, the row is gone. In a knowledge graph, an entity exists as a node with a globally unique identity. Other parts of the graph reference that identity. The identity outlives any one table, any one application, any one schema.

Consider what happens when you change jobs on LinkedIn. Your Member ID does not change. Your worksAt edge changes target: it used to point to Company A, now it points to Company B. Your old worksAt becomes a workedAt (with date range). Your skill graph stays intact. Your connection graph stays intact. Your endorsements, the comments you posted, the articles you wrote, the talks you gave, your endorsement of others, all stay intact, because they reference your identity, not your row in a job-history table.

This is the reason knowledge graphs are good at representing situations where entities persist while their relationships churn. Customers, products, employees, regulations, beneficial owners, and counterparties all behave this way in real organizations. We will see this exact pattern again in Part 11 when we design Lakeside Trust Bank’s customer 360 KG.

For practitioners: identity is the hardest part of knowledge graph construction in your organization. Two records about the same customer in two source systems are not naturally the same entity in the KG. Making them the same entity is called entity resolution, and it is the problem MDM has been wrestling with for two decades. We cover entity resolution in Part 5. If your organization already has an MDM golden record process, you are further along on KG construction than you think; see MDM in a Data Mesh World.

Inference: Where Knowledge Beats Storage

This is where LinkedIn earns the word “knowledge.”

A graph that just stores what users explicitly told it would only know what users explicitly told it. The Economic Graph holds far more than that. It infers connections, skills, similarities, and trajectories that no member ever entered. The inference is what made my former colleague show up in “People you may know” five years and two jobs after we last talked.

LinkedIn has published several inference systems on top of the Economic Graph. Three are particularly instructive.

Inference 1: Skill completion via a graph neural network

In December 2021 LinkedIn published a paper on Entity-BERT, a graph neural network that completes member knowledge graphs. The example they give: “if the member knows machine learning and works at Google, we can infer that the member is skilled in Tensorflow, even if their current profile does not say so.”

The model “uses a multi-layer bidirectional transformer for aggregation” and “computes the interaction (attention) between every pair of entities to update a node’s representation,” repeating this 6 to 24 times. Training is self-supervised: the model masks 10 percent of entities per profile and learns to predict the masked attributes from the surrounding context. The result is a knowledge graph that knows things about you that you did not say.

Inference 2: People you may know

The PYMK system has been publicly described many times by LinkedIn. It is, at heart, triangle closing on the Member-Member graph: if Alice knows Bob and Bob knows Carol, predict that Alice knows Carol. The first version of PYMK was a logistic regression model with hundreds of features (shared employer, shared school, age proximity, geographic distance, mutual connection count). Modern versions use multiple graph-based candidate generators, an XGBoost ranker, and neural network rankers that estimate invitation-acceptance probability.

Triangle closing in PYMK: Alice knows Bob and Bob knows Carol, both stated. The graph predicts Alice knows Carol with a confidence score derived from features like mutual connections, shared employer, shared school, and geographic proximity.

The scale claim from LinkedIn is striking: PYMK is responsible for building “more than 50 percent of LinkedIn’s professional graph”. More than half of the member-to-member connections in the graph were suggested by inference, not sought out unprompted by a user. The figure is about connections formed, not every relationship type the system stores.

Inference 3: Skill extraction from text

LinkedIn’s Skills Graph extraction pipeline reads job postings, member profiles, and other free-text inputs and maps mentioned phrases onto canonical Skill entities. The system uses a multi-task learning framework that “leverages signals from multiple contexts” and section-aware weighting (a skill mentioned in the “qualifications” section of a job post is weighted more heavily than one in the “company description” section).

This is inference of a different kind. It is taking unstructured text, locating skill mentions inside it, resolving them to a canonical Skill node, and creating a typed edge to the source Member or Job. Inference 1 was about edges; inference 2 was about predicting edges; inference 3 is about extracting edges from text. We will return to extraction in Part 6.

The agent-era restatement: every modern AI agent that retrieves “relevant context” for a question is doing something analogous to inference 3 plus inference 1, less well, with vector similarity instead of typed relationships, and with no inference 2 at all. The reason GraphRAG is becoming a serious topic in 2026 (covered in Part 9) is that organizations are realizing their AI agents need the kind of structured knowledge LinkedIn has been building for fifteen years.

What Makes This a Knowledge Graph and Not Just a Graph

A reader who comes from databases might object: this is just a graph database with some machine learning on top. Why call it knowledge?

The four-part lens answers it. A knowledge graph requires all four:

ComponentWhat it isWhat LinkedIn has
EntitiesThings in the world, with stable global identity and propertiesMembers, Companies, Jobs, Skills, etc. each with their own ID space
Typed relationshipsDirected, named, schema-extensible edges with semanticsworksAt, hasSkill, attendedSchool, etc., extensible at runtime
IdentityGlobally unique, stable identifiers that outlive any one source or schemaMember IDs, Company IDs, Skill URNs
InferenceDerived facts that no one explicitly statedEntity-BERT skill completion, PYMK triangle closing, skill extraction

A graph database that holds nodes and edges but has no schema, no global identity, and no inference is just a fast adjacency-list store. A knowledge graph is a graph database plus the three things that make stored facts add up to something more than themselves.

For practitioners: the term “knowledge graph” gets attached to many products that have only one or two of these four. A vector store with relationship metadata is not a knowledge graph; it is a vector store with relationship metadata. A property graph database with no ontology is a graph database, not a knowledge graph. We will give a precise definition in Part 3 and use this four-part lens to evaluate the major paradigms in Appendix A.

What LinkedIn Did Not Build

It is also useful to notice what the Economic Graph is not.

It is not built on RDF. LinkedIn uses a custom property graph database, not a triple store. The choice was justified by query performance for their specific workloads (PYMK, skill matching, search) and the open-ended schema requirement. We will explore the RDF-vs-property-graph paradigm choice in Part 3 and Appendix A.

It is not built on a public ontology. LinkedIn defines their own entity types, their own taxonomy of skills, their own normalization of titles. There is no FIBO-equivalent for the labor market that they could have adopted. They built their own ontology, governed it themselves, and curate it as a product. This is the most expensive single choice an organization can make about their KG, and we will examine alternatives in Part 4.

It is not perfectly accurate. The 2016 LinkedIn paper itself acknowledges that “not all explicit relationships are trustworthy,” giving a specific example where a design firm “incorrectly received 96 member mappings” before the Data Quality system caught the misclassification. Every entity attribute carries a confidence score: 1.0 if a human verified it, lower if it was inferred. Production knowledge graphs are noisy. We will cover provenance and quality in Part 7.

It is not the only knowledge graph LinkedIn has. There is a separate Knowledge Graph project layered on top of the Economic Graph for product applications (search ranking, ad targeting, feed) and a separate skills graph for the skill-extraction pipeline. Real organizations rarely have one knowledge graph; they have many that share entities and identity but specialize in different domains. We will see this pattern again in Part 11 when Lakeside Trust Bank’s KG splits into a customer KG, a product KG, and a regulatory KG that share identity but optimize separately.

The Vocabulary You Just Learned

Stop and check: if you read this article cold, you now know what these words mean and roughly how they show up in a real system.

TermWorking definitionWhere it shows up in this article
EntityA thing in the world represented as a node, with stable identity and propertiesMember, Company, Job, Skill, etc.
PropertyA key-value attribute on an entityname, currentTitle, connectionCount
Relationship (edge)A directed, typed connection between two entitiesworksAt, hasSkill, attendedSchool
SchemaThe set of allowed entity types and relationship typesLinkedIn’s open-ended extensible schema
IdentityA globally unique, stable identifier for an entityMember ID, Company ID, Skill URN
InferenceDeriving new facts from existing factsEntity-BERT skill completion, PYMK triangle closing
Confidence scoreA numeric trust value attached to a fact1.0 for human-verified, lower for inferred
Entity resolutionDetermining whether two records refer to the same real entityDisambiguating duplicate Company entries

We will refine each of these in later articles, but you have the working set. Specifically:

  • Part 3 will define knowledge graph rigorously and contrast it with a graph database, a knowledge base, and a semantic layer.
  • Part 4 will cover ontology, taxonomy, and schema design (the choice LinkedIn faced when designing their skill taxonomy).
  • Part 5 will cover identity, IRIs, entity resolution, and inference in depth.
  • Part 6 will cover how a graph gets built from raw data sources (extraction, mapping, resolution).
  • Part 9 will cover how AI agents use knowledge graphs for retrieval, which is the reason this whole topic has become operationally urgent in 2026.

Why This Series Exists

If knowledge graphs were a passing fad, you could skip the next fifteen articles. They are not. Three forces converge in 2026 to make a working KG capability a practical requirement, not an academic curiosity.

The first force is AI agents. Vector retrieval over a chunk store gets you most of the value of single-shot question answering, but it falls over when an agent needs to reason across multiple hops. “Find me companies that have failed audits twice in the last three years and are owned by entities whose other holdings have failed audits” is a four-hop query that a vector store cannot answer because vectors do not know what “owned by” means. Knowledge graphs do.

The second force is governance fatigue. Every organization with a serious data platform has, by 2026, accumulated Data Lineage tracking, a Data Catalog, a semantic layer, an MDM golden record process, and a CDE program. These are five fragments of the same underlying thing: a graph of entities, relationships, and constraints. Building each as a separate system is expensive and brittle. Putting them on a shared knowledge-graph backbone is increasingly the architecturally cleaner answer. We will cover this in Part 10.

The third force is regulatory pressure. Bank regulators want lineage from a customer transaction to the policy that governed it. Privacy regulators want a provenance trail from a model output to the training data that produced it. Both are graph queries on a knowledge graph. The organizations that have one will answer audit questions in minutes. The ones that do not will commission another consulting engagement.

LinkedIn started building the Economic Graph in 2011. It has taken them fifteen years to get to 1.2 billion members with reliable inference. Your organization does not need a planet-scale KG; it needs one that fits its actual problem space. But the principles are the same. The next fifteen articles will work through them, from first principles, in a tool-agnostic way, ending with a reference implementation for a mid-size US bank that any reader can adapt.

Do Next

PriorityActionWhy it matters
This weekOpen LinkedIn and pick three “People you may know” suggestions. Read the implicit signals: shared school, shared employer, mutual connections, geography. You are reading the inference layer of a knowledge graph in production.Ground the abstract concept in a concrete system you already use. The next 15 articles will assume you have done this.
This weekInventory the “knowledge-graph-shaped” projects in your organization that probably exist but are not labeled as KGs: Data Lineage, MDM, semantic layer, Master Data Management, customer 360.These are likely fragments of the same underlying thing. Recognizing them is the first step to consolidating them.
This monthRead the 2016 Building the LinkedIn Knowledge Graph post in full and take note of the explicit construction steps (taxonomy, relationship inference, embedding). The vocabulary will recur in Parts 4, 5, and 6.Reading one production case study end-to-end is worth ten vendor pitches.
This monthSketch your own organization’s “Member-Company-Skill” equivalent. What are the three or four entity types your business cares most about? What are the relationships between them? What inference would unlock the most value?This is the seed of your own KG design exercise, which we will return to in Part 11.
This quarterRead Part 2 of this series (Why Most Enterprise Knowledge Graph Projects Die in Year Two) before you propose a KG initiative inside your organization.The failure modes are predictable. Knowing them up front is the cheapest insurance you can buy.

Part 2 of this series, “Why Most Enterprise Knowledge Graph Projects Die in Year Two,” covers the failure patterns that kill most KG initiatives before they ship value. Read it next.

Sources & References

  1. Building the LinkedIn Knowledge Graph (He, Chen, Agarwal, 2016)(2016)
  2. LinkedIn Economic Graph: A digital representation of the global economy(2026)
  3. Completing a Member Knowledge Graph with Graph Neural Networks (Yang, Chen, Li)(2021)
  4. Building a Large-Scale Recommendation System: People You May Know(2022)
  5. Extracting skills from content to fuel the LinkedIn Skills Graph(2023)
  6. From the Economic Graph to Economic Insights: Building the Infrastructure(2018)
  7. LinkedIn Data Infrastructure: Graph(2016)
  8. LinkedIn: A New World of Work: Global Labor Market Rotates, Not Retreats (Jan 2026)(2026)
  9. Knowledge Graphs (Hogan et al., ACM Computing Surveys 2021)(2021)
  10. W3C RDF 1.1 Concepts and Abstract Syntax(2014)
  11. Tim Berners-Lee, Linked Data Design Issues(2006)

Stay in the loop

Get new articles on data governance, AI, and engineering delivered to your inbox.

No spam. Unsubscribe anytime.