SHACL in a nutshell

Note: This post is an adaptation of Section 1.1 of my PhD thesis. The purpose of publishing it as a standalone blogpost is to further disseminate my writings. It is part of a short series of posts representing the Introduction of my thesis:

The main purpose of SHACL is to write a schema, in SHACL terminology called a shapes graph (The reason this is called a "graph" is that the SHACL syntax is written in RDF.), that describes the expected structure of an RDF graph. The RDF graph of which some structure is expected is referred to as the data graph. Given a shapes graph and a data graph, the main task is to check whether the given data graph satisfies the requirements specified by the shapes graph. This task is called conformance checking and it is done by software called a (SHACL) validator. Furthermore, it is often expected of validators to not only check for conformance, but to also generate a validation report specifying which nodes violate which parts of the shapes graph.

Shapes

A SHACL shapes graph consists of a set of shapes which are structural constraints on nodes. Shapes have a name which is a blank node or an IRI. When we evaluate a shape on a node, that node is called a focus-node. There are two types of shapes: node shapes and property shapes. A node shape directly defines constraints on the focus node, while property shapes defines constraints on the value nodes of the focus node. The value nodes are the nodes reachable through a property or path expression that is given as a parameter to the property shape (using the sh:path keyword in SHACL).

Throughout this chapter, we will use examples from the access control setting. For this section, we will simply define some shapes about users in this setting. Basic users can access and create resources, while power users can also add users to the system and approve of them to also become power users. Consider the following data graph:

:admin_user a :Admin ;
  :adds :user_a ;
  :adds :user_b ;
  :approves :user_b .

:user_b :adds :user_c ;
  :approves :user_c .

:user_a :accesses :resource1 .
:user_b :accesses :resource1 .
:user_c :accesses :resource1 ;
  :creates :resource2 .

The idea of the first few lines is that the :admin_user adds users :user_a and :user_b to the system. They also approve of :user_b, meaning that :user_b is now also trusted within the system.

Consider the following node shape that consists of one node test and two property shapes:

:basicUserShape a sh:NodeShape ;
  sh:nodeKind sh:IRI ;
  sh:property [
    sh:path :accesses ;
    sh:minCount 1
  ] ;
  sh:property [
    sh:path :creates ;
    sh:maxCount 0
  ] .

The shape name is :basicUserShape. It specifies three requirements for a given focus node. First, the focus node should be an IRI, as opposed to a blank node or a literal. The sh:property keyword indicates that the focus node must also adhere to a property shape. The first property shape is about the value nodes reachable with the :accesses property. Specifically, there must be at least one such value node. The second property shape is about the value nodes reachable with the :creates property. In this case, there may be at most 0. In natural language, this shape states: "The focus node is an IRI and :accesses at least one resource, but :creates none." So, in our example graph, only :user_a and :user_b satisfy :basicUserShape.

SHACL has many different features for writing constraints. These features are called constraint components. The previous example made use of a value type constraint component, here indicated with the sh:nodeKind keyword, to check whether the focus node is an IRI, but also of cardinality constraint components to count the number of value nodes, indicated with the sh:minCount and sh:maxCount keywords. SHACL has many constraint components and, for my investigations, we restrict ourselves to the core constraint components. Next follows some examples of some of the components that illustrate the capabilities of SHACL.

Property Pair Constraint Components

In property shapes, we can compare the value nodes to another set of nodes (reachable by some other property) in some predefined ways. The most important are equality and disjointness checks between the two sets. Consider the following node shape:

:powerUserShape a sh:NodeShape ;
    sh:not [ a sh:PropertyShape ;
             sh:path :adds ;
             sh:disjoint :approves ] .

This shape asks of the focus node that the set of value nodes, i.e., the nodes reachable with the :adds property, is not disjoint from the set of nodes reachable with the :approves property. In natural language: "The focus node approves at least one user they also added." So, in our graph :admin_user and :user_b satisfy :powerUserShape.

Shape-based Constraint Components

We can refer to other shapes as well. Most notably, we can combine this with the counting from the cardinality constraint components to count only value nodes that conform to some other given shape. For example, consider the

:authorizedUserShape a sh:PropertyShape ;
    sh:path [ sh:zeroOrMore [ sh:inverseProperty :approves ] ] ;
    sh:qualifiedValueShape :isAdminShape ;
    sh:qualifiedMinCount 1 ;
  ] .

:isAdminShape a sh:PropertyShape ;
  sh:path rdf:type ;
  sh:hasValue :Admin .

A first thing to note is that this shape makes use of a (complex) path expression. SHACL supports path expressions similar to those of SPARQL. This shape states that there must be at least one value node reachable, by following inverse :approves properties, that conforms to :isAdminShape. The keyword sh:qualifiedValueShape is used to refer to a shape, together with the sh:qualifiedMinCount to denote the desired cardinality. In our example graph, :admin_user, :user_b, and :user_c satisfy :authorizedUserShape. Note that it is different from writing:

:altAuthorizedUserShape a sh:PropertyShape ;
    sh:path [ sh:zeroOrMore [ sh:inverseProperty :approves ] ] ;
    sh:node :isAdminShape ;
    sh:minCount 1 ;
  ] .

where sh:node is used to refer to another shape. This seems to express the same shape, however, all constraints used in a property shape apply to all value nodes, i.e., it is an implicit universal quantifier. When a cardinality constraint is then used, it is separate from the other constraints, and only refers the number of value nodes. In words, the :altAuthorizedUserShape states that all value nodes conform to :isAdminShape and there is at least one node in the set of value nodes. In our example graph, only :admin_user satisfies :altAuthorizedUserShape.

Closed Constraint Component

Finally, another interesting feature is closedness. It states that only a select few properties are allowed for a focus node. These select few properties can be given explicitly by the sh:ignoredProperties keyword, or are implied by the structure of the shape: the set of non-blank nodes obtained by following the sh:property keyword, followed by the sh:path keyword. Consider :basicUserShape defined above. We can expand the shape by adding the following triples:

:basicUserShape sh:closed true ;
  sh:ignoredProperties ( rdf:type ) .

The shape is closed, meaning the only allowed properties are give by sh:ignoredProperties, i.e., rdf:type, but we also allow the properties related to the property shapes: :accesses and :creates. In natural language: "The focus node has no other properties than rdf:type, :accesses, or :creates; and must have an :accesses property and no :creates property." So in our example graph, only :user_a now satisfies :basicUserShape.

Clearly, there are many intricacies in the semantics of SHACL that make it unsuited to study directly. Thereto, Chapter 2 of my thesis is dedicated to formalizing SHACL in such a way that makes it fit for the different investigations. There are many other constraint components than those demonstrated here. However, the ones discussed here are illustrative of the core SHACL features. Detailing the exact semantics of all core SHACL features is also discussed there.

Targeting

A set of shape definitions on its own does not allow us to validate a graph in SHACL. Every shape can have a target declaration associated with it. A target declaration is a query that determines all the focus nodes for a given shape. There are five types of target declarations allowed in SHACL. Most of them are parameterized with an IRI or blank node t:

  • Node targets. There is one focus node, which is the parameter t (regardless whether t occurs in the graph.)
  • Class-based targets. Targets all nodes that are of RDF class t.
  • Implicit class targets. Targets all nodes whose RDF class is the shape name.
  • Subjects-of targets. Targets all nodes that are the subject of a triple where the predicate is t.
  • Objects-of targets. Targets all nodes that are the objects of a triple where the predicate is t.

We can add target declarations to the shapes from the previous section to make a complete shapes graph. Adding the Subjects-of target triple

:authorizedUserShape :targetSubjectsOf :approves

to the shapes graph gives us a complete constraint on the graph. In this case, our example data graph conforms because all nodes that approve another node are indeed authorized users. However, if we add the triple :user_a :approves :user_d to the data graph, it would no longer conform, as :user_a does not satisfy :authorizedUserShape.