Bringing Semantics to Closed-World Systems

Posted on 2026-03-17

The origin of most data in the world comes from closed-world systems. Forms, logging systems, and sensors all generate data in predefined structures where the schema is known in advance. Yet when people enter the world of RDF, interested only in adding semantics to their data, they’re confronted with an all-or-nothing proposition: “Here’s a triple store, learn OWL, RDF/XML is bad but your code must be able to process it, accept that you live in an open-world. Take it or leave it.”

But many use cases don’t require the full semantic web stack. This post explores how closed-world systems can benefit from RDF semantics without adopting the full semantic web stack. We’ll cover when closed-world assumptions make sense, how to bridge closed and open-world systems, practical examples from sensors, UIs, and IoT, new SHACL features in development that enable these patterns, and what the RDF community needs to do to make this easier.

Open vs. Closed World: Beyond Schema

Before diving into practical applications, it’s important to clarify what “open-world” and “closed-world” mean in different contexts. In logic and knowledge representation, the closed-world assumption (CWA) traditionally means that if a fact is not known to be true, it is assumed to be false. Conversely, the open-world assumption (OWA) means that the absence of information does not imply falsity – something not stated could still be true.

This philosophical distinction manifests in two practical ways:

Inference and reasoning: Under OWA, just because a triple doesn’t exist doesn’t mean the statement is false. It might simply be unknown. Under CWA (like in SQL), if a row doesn’t exist, the information is considered false or non-existent.
Schema extensibility: Under OWA, systems must handle unexpected properties gracefully. Under CWA, only predefined properties are allowed, and unexpected properties are rejected. The key question here: does the absence of a property definition mean that the property doesn’t exist?

This blog post primarily focuses on schema extensibility, the question of whether your system accepts only predefined properties (closed) or handles arbitrary properties (open). While the reasoning aspect remains relevant, the most practical challenges in adopting RDF stem from schema flexibility requirements. Many systems need the predictability and performance of closed schemas without requiring full open-world reasoning capabilities. The tools and patterns described here address this specific use case.

Why We Need Semantics

Adding semantics to data provides two fundamental benefits.

First, semantics enable data interoperability through global uniqueness. When properties and types use standardized identifiers (like IRIs), different systems can understand and combine data without custom integration code. This eliminates ambiguity — a temperature measurement is universally recognizable, not just “temp” in one system and “temperature_c” in another.

Second, semantics create machine-understandable knowledge. Rather than requiring humans to interpret schemas and write integration logic, machines can directly understand relationships, hierarchies, and constraints. This is particularly valuable as AI systems increasingly need to work with diverse data sources.

Together, these benefits simplify data integration, enable more sophisticated AI applications, and make systems future-proof as data needs evolve.

The Semantic Web Spectrum

The choice between closed- and open-world is not binary – it’s a spectrum of approaches. Different layers of an architecture can have different assumptions. A closed-world data producer can feed into an open-world consumer without issue.

Hybrid architectures provide the best of both worlds: efficiency and simplicity at the data generation layer, combined with flexibility and reasoning at the integration layer. The key is using semantics (RDF vocabularies, shapes) even in closed-world contexts. This enables interoperability without forcing open-world complexity everywhere.

When to Use Closed-World Systems

Systems that only generate data without consuming it don’t require the open-world assumption. Examples include data from HTML forms, sensor data observations, and application logs. In these cases, the data structure is known at design time and the producer controls the schema completely.

Many constrained devices cannot afford the overhead of full triple processing. Similarly, systems where performance and predictability are critical, such as high-frequency data ingestion (thousands of events per second) or real-time processing requirements, benefit from the optimized storage and indexing that closed-world assumptions enable.

Selection Criteria

When deciding between closed and open-world approaches, consider these guidelines:

Choose closed-world when you control both data production and initial consumption, the schema is stable and well-defined, performance is critical, you’re working with constrained devices, or the system is a data source rather than an integrator.

Choose open-world when you’re integrating data from multiple unknown sources, schema evolution is frequent and unpredictable, reasoning and inference are required, or you need to handle unexpected properties gracefully.

Use hybrid approaches when you have closed-world producers feeding open-world consumers, different subsystems have different requirements, or you want efficiency at the edges with flexibility at the core.

Advantages of Closed-World Systems

The closed-world approach brings concrete advantages. Data structures and derived structures in code can be pre-defined or pregenerated, requiring less dynamic data processing. Existing solutions like SQL databases can be used and better optimized for the use case. You get type safety and validation at compile time, better IDE support and developer experience, and easier testing with known schemas.

New Tools in SHACL 1.2 (In Development)

The upcoming SHACL 1.2 specification will add crucial features for data modelling in closed-world contexts. The new closed shape mode (sh:closed with sh:ByTypes) allows a shape for a specific type or class to express that only properties defined in the shape are allowed. Violations occur when unexpected properties appear, enabling compile-time code generation with confidence.

The sh:codeIdentifier property maps RDF properties to code-level identifiers — SQL column names, JSON object keys, class property names. This bridges the semantic and implementation worlds while eliminating manual mapping configuration.

SHACL Profiles define subsets of SHACL features for specific use cases, enabling tool vendors to support a “closed-world profile” versus “full SHACL”. This improves interoperability and provides simpler validation for constrained devices.

These features will make SHACL viable for the same use cases as GraphQL schemas or JSON Schema, but with the added benefit of semantic interoperability.

Bridging Closed- and Open-Worlds

Closed-world systems can still participate in the semantic web — they just don’t need to implement the full stack internally. Key integration patterns make this possible: REST APIs can use JSON-LD context to add semantics to JSON responses, virtual graph solutions can expose SQL data as SPARQL endpoints, ETL processes can transform closed-world data into open-world triple stores, and federated queries can combine closed-world sources with open-world reasoning.

The closed-world system remains efficient internally while semantic integration happens at the boundary. Other systems can consume the data as part of larger open-world systems. This is not “one or the other” — both worlds work very well together.

Examples

These examples demonstrate different aspects of closed-world systems with semantic integration:

Data Generation: Sensor Data

Problem: High-frequency sensor readings need efficient storage and querying, but also need to integrate with building management systems. This example is based on my real world setup at my home with 30+ sensors.

Closed-World Solution: SHACL shapes define the structure of sensor data (temperature, humidity, CO2, timestamps). These shapes are translated to RML mappings automatically, and SQL tables are generated from the shapes. A REST API accepts RDF data matching the defined shape and translates it into SQL inserts, with data validation happening against the closed shape at ingestion time.

Semantic Integration: The Ontop virtual graph layer provides SPARQL access to the SQL data. Metadata, such as which room a sensor is located in, is stored in a separate triple store with open-world support. Federated SPARQL queries combine high-volume sensor readings (closed-world) with flexible metadata (open-world).

Benefits: Sensor data is stored efficiently in optimized SQL tables that can handle thousands of readings per second. Semantics in the data make it easier for AI systems to understand relationships without needing a full triple store for the high-volume data.

User Interaction: HTML Forms

Problem: UIs for open-world systems are complicated for both users and developers. Dynamic forms can be challenging to implement and maintain, and UI frameworks are not designed for direct triple manipulation, most frameworks expect objects or arrays, not triples. The open-world assumption means unpredictable properties, making form generation complex.

Closed-World Solution: When the data model is defined with closed SHACL shapes only, forms can be generated as part of the build process from the shapes. Validation rules are generated from shape constraints, JSON is used for data exchange instead of triples, and the backend validates against the same shapes.

Semantic Integration: A JSON-LD context maps JSON to RDF properties, the backend can still process as RDF when needed, and integration layers handle conversion to triples.

Benefits: This approach produces lightweight and efficient UI code with a better developer experience through type-safe form handling. Users get familiar, predictable forms, while semantics are preserved for downstream systems.

Data Generation: Constrained IoT Devices

Problem: Embedded devices have limited memory and processing power and cannot run a full RDF stack.

Closed-World Solution: Devices create and accept data only in predefined closed shapes. Data structures are generated in the build process from SHACL shapes, with code generation creating efficient C or Rust structs matching the shapes. Validation is compiled into firmware.

Semantic Integration: A JSON-LD context is embedded in device metadata, gateway services translate to full RDF when needed, and data automatically has correct semantic annotations.

Benefits: This approach achieves a minimal memory footprint on the device with no runtime RDF processing overhead. Data remains semantically interoperable while automated tooling reduces manual coding.

Example: Consider a temperature sensor with 64KB RAM that sends {"@context": "...", "temperature": 23.5, "humidity": "..."}. The gateway understands this as semantically correct RDF, with no RDF library needed on the device itself.

Industry Case Study: Netflix UDA

Problem: Netflix’s business entities (movies, actors, series) were modeled inconsistently across hundreds of systems. Each team created their own schemas for the same concepts, leading to data inconsistency and duplication. Changes required manual updates across multiple systems, and there was no single source of truth.

Solution: Netflix’s blog post Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix, describes how they were able to use a metamodel called Upper to unify their data models. Upper is based on W3C standards — RDF for graph representation and SHACL for validation. It’s a bootstrapping metamodel that models itself and defines the language for describing domain entities.

Closed-World Implementation: Domain entities are modeled once in Upper and automatically projected to multiple target formats: GraphQL schemas for APIs, Avro schemas for event streaming, SQL schemas for databases, Java types for application code, and Apache Iceberg tables for the data warehouse. Code generation preserves semantics across all projections, and validation rules are enforced consistently everywhere.

Benefits Achieved: Netflix now has a single source of truth for business entities with automatic propagation of changes across the entire stack. This has reduced data inconsistency, improved developer experience with generated types, and enabled automated UI generation from taxonomies.

Relevance to This Post: Netflix built Upper because existing tools didn’t meet their needs. They chose RDF and SHACL as the foundation, validating the semantic approach. Their success demonstrates the demand for closed-world modeling with semantic foundations. Upper is essentially what the community needs: SHACL-based code generation. With the upcoming SHACL 1.2’s sh:codeIdentifier and improved closed shape support, much of Upper’s functionality could become standardized. If standard tooling existed, Netflix might not have needed to build a custom framework. Their architecture proves that closed-world plus semantics scales to enterprise complexity.

Summary

Closed-world systems are the reality for most data generation. Semantics should be added as early as possible, even in closed-world contexts. RDF and closed-world are not mutually exclusive – use closed-world internally, expose semantics at boundaries. The upcoming SHACL 1.2 will provide the tools needed to model closed-world systems with standard vocabularies.

Implications for the RDF Community

The community needs to be more welcoming to closed-world use cases. The “use a triple store, or you’re not doing it right” attitude pushes people away. The RDF model is still the best solution for adding semantics to data, but the full semantic web stack is overkill for many use cases. Making RDF work in closed-world contexts expands adoption, and this matters especially as AI systems increasingly need semantic data.

Missing Tooling and Next Steps

What’s needed to make this easier: Shape-to-code generators that translate SHACL shapes to TypeScript, Python, Rust, or C structs, with automated validation code generation and ORM-like libraries for shape-based data access. Libraries for constrained devices are required to broaden the coverage of the RDF ecosystem. Existing SHACL-based UI frameworks focus on dynamic forms, but shape-based static form generators are needed as well.

A Call to Action

If someone doesn’t use RDF in a closed-world system, don’t just ask why they didn’t. Be open to discussing what is missing in RDF ecosystems and maybe even help building those tools. The underlying technology is coming together with the work on SHACL 1.2. As a member of the W3C Data Shapes Working Group, I’m trying to have a look from different perspectives and contribute so SHACL 1.2 will cover closed-world use cases. Together, we need to meet developers where they are. That’s how we make semantic data the norm, not the exception.

bergis universe of software, hardware and ideas