Industry Teardowns June 19, 2026 · 12 min read

How Netflix Governs Data at Scale: Lessons from Their Data Mesh Journey

A detailed analysis of how Netflix actually governs data, from Metacat to DataHub, from centralized pipelines to domain ownership, and the Unified Data Architecture (UDA) that ties it together.

By Vikas Pratap Singh
#netflix #data-mesh #data-governance #data-platform #federated-governance #metadata-management

500 Data Engineers, Trillions of Events, and the Governance Problem Nobody Talks About

Netflix processes more than 2 trillion events per day through hundreds of thousands of ETL pipelines, populating millions of downstream data tables. Circa 2023, Netflix’s platform leadership described roughly 2,500 engineers plus an additional ~500-person data team (The New Stack, November 2023), so the data organization is several hundred people in addition to, not a subset of, the core engineering org. Netflix operates across 190+ countries with data flowing through content recommendation engines, A/B testing frameworks, content valuation models, and real-time streaming telemetry.

At this scale, Data Governance is not a policy document. It is an engineering system.

What makes Netflix’s journey worth studying is not that they got everything right from the start. They did not. Their Metadata Management platform has been through at least two major evolutions. Their “Data Mesh” (their term for a specific internal data movement platform) does not map cleanly to Zhamak Dehghani’s four-principle framework. And their recent Unified Data Architecture reveals that even Netflix still wrestles with domain modeling consistency across teams.

The interesting story is how they iterated, what they built, and, critically, what you can and cannot take from it.

The Evolution: From Metacat to DataHub to UDA

Netflix’s Data Governance infrastructure did not arrive as a grand architecture. It evolved through three distinct phases, each driven by concrete operational pain.

Phase 1: Metacat (2018), Federated Metadata Access

Netflix’s data warehouse spans Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake, and MySQL, with compute engines including Spark, Presto, Pig, and Hive. When no single system could provide unified metadata access across this landscape, Netflix built Metacat, an open-source federated metadata API that abstracted away storage-layer differences and provided a single interface for data discovery.

Metacat solved the immediate problem: engineers could discover what data existed and where it lived. It captured schema changes, provided a notification system for data mutations, and created a traceable record of modifications across the warehouse layer.

But Metacat had a boundary problem. It was purpose-built for the Big Data Warehouse layer. As Netflix expanded into online stores, real-time streaming pipelines, and ML feature stores, the gaps became operational pain. The connector development burden fell entirely on the central Data Platform team, the exact bottleneck pattern that data mesh thinking was designed to eliminate.

Phase 2: DataHub Adoption, Self-Serve Metadata

Netflix evaluated multiple metadata platforms before selecting LinkedIn’s open-source DataHub as the foundation for their next-generation catalog. The selection criteria reveal their governance philosophy: they wanted “a product that could be both a Data Catalog and a data platform,” not just a searchable inventory, but an extensible metadata infrastructure.

The migration from Metacat to DataHub addressed three specific governance failures:

  1. Connector ownership: DataHub’s extensibility model let source-system teams define their own asset types and ingest metadata directly, rather than requiring the central platform team to build and maintain every connector.

  2. Policy enforcement: Metacat lacked a governance policy engine. DataHub provided the infrastructure to enforce access controls, data classification, and ownership policies within the catalog itself.

  3. Cross-layer visibility: DataHub’s graph-based architecture could span the full stack (online stores, streaming pipelines, batch warehouse, and analytics) where Metacat could not.

The result: self-serve Metadata Management, stronger governance enforcement, and reduced dependence on a central team. This is perhaps the single most transferable lesson from Netflix: the shift from “central team builds everything” to “platform provides the tools, domain teams own their metadata.”

Phase 3: Unified Data Architecture and Upper (2025), Model Once, Represent Everywhere

The most recent evolution is the most architecturally ambitious. In June 2025, Netflix unveiled the Unified Data Architecture (UDA) built on a custom metamodel called Upper.

The problem UDA solves: as Netflix’s business expanded into advertising, gaming, and live events alongside streaming, core business concepts like “movie,” “actor,” or “subscriber” were modeled independently across dozens of systems. Each team created its own representation, leading to inconsistent definitions, duplicated logic, and integration friction. This is the exact problem that data mesh’s federated approach can exacerbate without strong governance.

Upper is a metamodel based on W3C standards (RDF for graph representation, SHACL for validation) that enforces a “model once, represent everywhere” principle. Domain experts model their business concepts in Upper, and the system automatically generates concrete technical artifacts (GraphQL schemas, Avro records, Apache Iceberg tables, SQL schemas, and Java types) for downstream consumption.

The architecture has four self-bootstrapping properties that reflect sophisticated governance thinking:

  • Self-describing: defines what a domain model is
  • Self-referencing: models itself as a domain
  • Self-governing: validates itself against its own rules
  • Federated: closed for modification, open for extension

Netflix data platform architecture: domain teams (Content, Streaming, Ads, Studio) extend the Upper/UDA metamodel (RDF + SHACL), which generates GraphQL, Avro, Iceberg, and SQL projections. Avro and Iceberg stream through the Data Mesh platform (Apache Flink + Kafka), while GraphQL and SQL register with the governance layer (DataHub, Data Lineage, DataJunction), which serves governed analytics, ML, and the LORE chatbot.

Mapping Netflix Against Dehghani’s Four Data Mesh Principles

Zhamak Dehghani coined the term “data mesh” in 2019 and defined four principles: domain-oriented ownership, data as a product, self-serve data infrastructure, and federated computational governance. Netflix’s implementation partially aligns, partially diverges, and partially predates the framework entirely.

Principle 1: Domain-Oriented Ownership (Partial Alignment)

Netflix has reorganized data responsibilities around domain teams: content, streaming, advertising, studio production. These teams own their data assets and are responsible for quality and accessibility.

However, Netflix’s “Data Mesh” (the internal platform) is a centralized data movement and processing layer built on Apache Flink and Kafka, not a decentralized architecture where domains independently manage their own pipelines. The infrastructure is centralized; the accountability is distributed. This is a critical distinction that many organizations miss when copying the Netflix model.

Principle 2: Data as a Product (Strong Alignment)

This is where Netflix excels. The DataJunction platform centralizes metric definitions so that a metric owner registers a metric once and consumers across the organization apply that same definition at any dimensional grain. DataJunction handles dimensional joins automatically. If a dimension does not exist in the fact table, the owner only needs to declare the foreign key relationship.

The LORE chatbot takes this further: it uses LLMs to provide natural language access to governed data, with human-readable reasoning that users can cross-verify. The combination of centralized metric definitions and LLM-powered access represents one of the most advanced “data as a product” implementations publicly documented.

Principle 3: Self-Serve Data Infrastructure (Strong Alignment)

Netflix’s platform team operates as an internal product team. The Data Lifecycle Manager passively monitors data and automatically handles deletion and tiering to cold storage. The Data Gateway platform declaratively manages thousands of shards across dozens of data abstractions. The DataHub adoption explicitly aimed to enable “self-serve data cataloging across teams.”

The platform investment is massive. This is not a Terraform module and a wiki page. It is a portfolio of purpose-built internal products maintained by hundreds of engineers.

Principle 4: Federated Computational Governance (The Most Interesting Part)

Upper and UDA represent Netflix’s answer to federated governance, and it is more sophisticated than most implementations. Rather than relying on committee-based governance (review boards, approval gates), Netflix encoded governance into the metamodel itself: Upper validates domain models against its own rules automatically.

Data Lineage provides automated enforcement for SLA compliance, cost attribution, and data retention. The lineage system does not just document dependencies. It forecasts job SLAs, drives automated cleanup of unused data, and attributes infrastructure costs to consuming teams.

This is computational governance in practice: policies expressed as code, enforced by systems, not by meetings.

The Reality Check: What Netflix Has That You Do Not

Before your next architecture meeting includes a slide titled “Our Netflix-Inspired Data Mesh,” consider what makes their approach work:

Engineering talent density. Netflix’s ~500-person data team operates within a culture that famously pays top-of-market and maintains extremely high performance expectations. They can build and maintain custom platforms (Metacat, UDA/Upper, DataJunction, Data Lifecycle Manager) that most organizations would need to buy from vendors. A 2023 academic study on data mesh implementations across 15 organizations found that “resource constraints,” both financial and human, are among the top six barriers to successful implementation.

Platform investment scale. Netflix does not use off-the-shelf tools and glue them together. They build purpose-built internal products with dedicated engineering teams. The Data Gateway alone manages thousands of shards. The lineage system covers hundreds of thousands of pipelines. Replicating this requires sustained platform engineering investment measured in years and tens of millions of dollars.

Cultural alignment. Netflix’s “freedom and responsibility” culture means domain teams accept data ownership as a core responsibility, not an additional burden imposed by a governance office. The academic research on data mesh adoption explicitly identifies “responsibility shift resistance” (where domain teams view data product ownership as extra work) as a primary failure mode. Netflix’s culture largely eliminates this friction.

Scale that justifies complexity. Trillions of events daily across millions of tables justifies building a custom metamodel. If your data estate is a hundred tables in Snowflake, the overhead of UDA-style governance exceeds the value it provides.

What You Can Actually Take From Netflix (With 50 Engineers, Not 500)

1. The Metacat-to-DataHub Transition Pattern

You probably should not build your own metadata platform. But Netflix’s migration path (starting with basic metadata federation, discovering its limits, then adopting an extensible platform that shifts connector ownership to domain teams) is the right sequence. Start with DataHub or OpenMetadata (not Metacat), and invest early in self-serve metadata ingestion.

2. Centralized Metrics, Decentralized Ownership

DataJunction’s pattern (one metric definition, consumed everywhere) is achievable with dbt’s Semantic Layer or Cube without custom development. The key insight is separating metric definition (centralized, governed) from metric consumption (decentralized, self-serve).

3. Lineage as an Operational Tool, Not Documentation

Netflix uses lineage for SLA forecasting, cost attribution, and automated data cleanup, not just pretty graphs. You can approximate this with Monte Carlo or similar observability tools layered on your existing lineage. The principle is: if lineage does not drive automated action, it is not earning its keep.

4. Governance as Code, Not Committees

Upper’s self-governing property (the system validates models against rules automatically) is the high-end version of a pattern available at any scale. Use data contracts (Great Expectations, OpenMetadata’s native contracts, Soda), schema validation in CI/CD, and policy-as-code (Open Policy Agent) to encode governance into pipelines. Replace quarterly governance review meetings with automated enforcement.

5. Data Lifecycle Management

Netflix’s automated data lifecycle (passive monitoring, automated tiering, cost-driven retention) requires no custom platform. Cloud-native tools (Snowflake’s storage tiering, S3 lifecycle policies, Databricks Unity Catalog) provide the primitives. The Netflix lesson is that this should be automated and metadata-driven, not manual and policy-driven.

Common Pitfalls of Copying FAANG Data Approaches

Copying the org chart instead of the principles. Netflix’s domain teams work because they have several hundred data professionals to distribute across domains. If you have 15, creating “domain data teams” of 2-3 people each means nobody has enough context or capability to operate independently. A Thoughtworks analysis of data mesh adoption through 2025 found that “lip-service domains” (rebranding existing IT teams as data domains without genuine ownership) was among the most common failure modes.

Over-investing in platform before proving demand. Netflix built custom tools because they outgrew every available vendor option. Building a custom data platform when your team could be productive with dbt + Snowflake + a commercial catalog is not ambition. It is waste. The academic research recommends a “quick wins approach”: small, inexpensive pilot projects that demonstrate value before committing to large-scale platform investment.

Treating data mesh as a technical architecture instead of an operating model. The Thoughtworks analysis identifies the core challenge: “changing organizational and individual behaviors, not technologies and architectures.” Netflix succeeded with federated governance because their culture already valued distributed ownership. If your organization’s default is to centralize control, no amount of Kafka topics and DataHub instances will create domain accountability.

Ignoring the talent gap. Netflix engineers are expected to operate at a level where they can independently design, build, and operate data products. If your data engineers are still learning SQL optimization, the gap between where you are and where Netflix operates is not bridged by adopting their tools. Invest in Data Literacy and engineering capability before investing in architecture.

Practical Takeaways

  1. Audit your metadata evolution stage. If you are still in “Metacat phase” (basic catalog, central team maintains everything), your next move is self-serve metadata ingestion, not UDA-style metamodeling.

  2. Implement one Netflix pattern at a time. Start with centralized metric definitions (dbt Semantic Layer or Cube) before tackling domain ownership or federated governance. Each layer builds on the previous one.

  3. Measure governance by automation, not documentation. Count automated policy checks, not pages in your governance handbook. Netflix’s approach works because governance is computed, not communicated.

  4. Right-size your ambition to your team. A 50-person data team can effectively adopt domain ownership, centralized metrics, and automated quality checks. It cannot build a custom metamodel, a bespoke data movement platform, and an LLM-powered data chatbot simultaneously. Sequence ruthlessly.

  5. Read the Netflix TechBlog directly. The primary sources are more nuanced than any summary. Netflix publishes detailed engineering blogs on each component (Metacat, Data Mesh (the platform), lineage, DataJunction, UDA/Upper) and they are worth reading for the implementation details that conference talks and case studies omit.

Do Next

PriorityActionWhy it matters
NowLocate your metadata maturity stage (central-team catalog vs. self-serve ingestion) and pick the single next stepSequencing beats ambition: Netflix’s value came from iterating Metacat to DataHub, not from leaping to a custom metamodel
NowCentralize metric definitions once with dbt Semantic Layer or Cube, consumed decentrallyReproduces DataJunction’s highest-leverage pattern with no custom platform build
NextReplace one recurring governance review meeting with automated enforcement (data contracts, schema validation in CI/CD, Open Policy Agent)Encodes governance as code, the transferable core of Netflix’s federated computational governance
NextWire Data Lineage to drive an automated action (SLA alert, cost attribution, or retention cleanup), not just a graphLineage that does not trigger action does not earn its keep
LaterRight-size domain ownership to your headcount before reorganizing into domain teams”Lip-service domains” of 2-3 people are a documented data mesh failure mode; copy the principle, not the org chart
LaterBuild Data Literacy and engineering capability ahead of platform investmentThe talent gap, not the tooling gap, is what most Netflix-inspired programs underestimate

Sources & References

  1. Data Mesh: A Data Movement and Processing Platform @ Netflix
  2. Metacat: Making Big Data Discoverable and Meaningful at Netflix
  3. Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency
  4. Navigating the Netflix Data Deluge: The Imperative of Effective Data Management(2024)
  5. Part 1: A Survey of Analytics Engineering Work at Netflix
  6. Breaking Silos: Netflix Introduces Upper Metamodel to Bring Consistency across Content Engineering(2025)
  7. Netflix Scales Metadata Management with DataHub
  8. How Netflix is Collaborating with DataHub to Enhance its Extensibility
  9. Data Mesh Principles and Logical Architecture
  10. The State of Data Mesh in 2026: From Hype to Hard-Won Maturity(2026)
  11. Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations(2024)
  12. Revolutionizing Data Architecture: The Netflix Data Mesh Case Study
  13. Netflix Metacat: Origin, Architecture, Features & More
  14. DataJunction: A Metrics Platform
  15. Netflix Metacat GitHub Repository
  16. Developer Productivity Engineering at Netflix(2023)
  17. Netflix Q4 2025 Financial Earnings: Subscribers(2026)

Stay in the loop

Get new articles on data governance, AI, and engineering delivered to your inbox.

No spam. Unsubscribe anytime.