UDA: Netflix’s Unified Data Architecture for Seamless Data Integration

The Challenge of Data Integration at Netflix Scale

As Netflix expands its offerings across films, series, games, live events, and ads, the complexity of managing core business concepts like ‘actor’ or ‘movie’ has grown exponentially. These concepts exist in multiple systems – from Enterprise GraphQL Gateway to asset management platforms and media computing systems – each modeling them differently with little coordination.

This fragmentation creates several critical challenges:

  • Duplicated and inconsistent models across different systems
  • Inconsistent terminology even within single systems
  • Data quality issues that are difficult to detect across microservices
  • Limited connectivity between related data across systems

Enter UDA: A Knowledge Graph Approach

To address these challenges, Netflix built UDA (Unified Data Architecture), a foundation for connected data in Content Engineering that enables teams to model domains once and represent them consistently across systems.

UDA functions as a knowledge graph built on RDF and SHACL foundations, allowing users and systems to:

  • Register and connect domain models to ensure consistent definitions of business concepts
  • Catalog and map domain models to data containers through graph representation
  • Transpile domain models into various schema definition languages while preserving semantics
  • Move data faithfully between different systems automatically
  • Discover and explore domain concepts via search and graph traversal
  • Programmatically introspect the knowledge graph using Java, GraphQL, or SPARQL

Upper: The Language of Domain Modeling

At UDA’s core is Upper, a language for formally describing domains and their concepts. Upper domain models are expressed as conceptual RDF organized into named graphs, making them queryable and versionable within the UDA knowledge graph.

Upper serves as the metamodel for all models in UDA – it’s self-referencing, self-describing, and self-validating. This bootstrap approach enables UDA to generate its own infrastructure and ensure consistent data semantics across schemas.

Mapping and Projecting Models to Real Systems

UDA connects abstract domain models to concrete data containers through mappings. These mappings are the arcs that link subgraphs of domain models to subgraphs of container representations, enabling:

  • Discovery of where domain concepts are physically stored
  • Semantic data integration across systems with different schema languages
  • Intent-based automation of data movement while preserving semantics

Projections then produce concrete data containers from these models – generating GraphQL or Avro schemas through transpilation and automatically populating data containers like Iceberg Tables by leveraging Data Mesh.

Real-World Applications at Netflix

UDA powers two key systems in production:

Primary Data Management (PDM) – A platform for managing authoritative reference data and taxonomies. PDM uses domain models to build user interfaces for business users and leverages UDA to project these models into Avro and GraphQL schemas.

Sphere – A self-service operational reporting tool that uses UDA to catalog business concepts across systems. Users can search for familiar terms like ‘actor’ or ‘movie,’ and Sphere generates SQL queries by walking the knowledge graph, eliminating the need for manual joins.

Future Directions

Netflix’s UDA represents a fundamental shift in data modeling approach, making information more consistent, connected, and discoverable. Future applications include supporting additional projections like Protobuf/gRPC, materializing the knowledge graph of instance data, and solving challenges in Graph Search.

For more detailed information on Netflix’s Unified Data Architecture, visit the Netflix Tech Blog