SHACL in a nutshell
Note: This post is an adaptation of Section 1.1 of my PhD thesis. The purpose of publishing it as a standalone blogpost is to further disseminate my writings. It is part of a short series of posts representing the Introduction of my thesis:
- Section 1.1: SHACL in a nutshell (this post)
- Section 1.2: Expressiveness
- Section 1.3: Recursion
- Section 1.4: Provenance
The main purpose of SHACL is to write a schema, in SHACL terminology called a shapes graph (The reason this is called a "graph" is that the SHACL syntax is written in RDF.), that describes the expected structure of an RDF graph. The RDF graph of which some structure is expected is referred to as the data graph. Given a shapes graph and a data graph, the main task is to check whether the given data graph satisfies the requirements specified by the shapes graph. This task is called conformance checking and it is done by software called a (SHACL) validator. Furthermore, it is often expected of validators to not only check for conformance, but to also generate a validation report specifying which nodes violate which parts of the shapes graph.
Shapes
A SHACL shapes graph consists of a set of shapes which are structural
constraints on nodes. Shapes have a name which is a blank node or an
IRI. When we evaluate a shape on a node, that node is called a
focus-node. There are two types of shapes: node shapes and
property shapes. A node shape directly defines constraints on the
focus node, while property shapes defines constraints on the
value nodes of the focus node. The value nodes are the nodes
reachable through a property or path expression that is given as a
parameter to the property shape (using the sh:path
keyword in
SHACL).
Throughout this chapter, we will use examples from the access control setting. For this section, we will simply define some shapes about users in this setting. Basic users can access and create resources, while power users can also add users to the system and approve of them to also become power users. Consider the following data graph:
:admin_user a :Admin ; :adds :user_a ; :adds :user_b ; :approves :user_b . :user_b :adds :user_c ; :approves :user_c . :user_a :accesses :resource1 . :user_b :accesses :resource1 . :user_c :accesses :resource1 ; :creates :resource2 .
The idea of the first few lines is that the :admin_user
adds users
:user_a
and :user_b
to the system. They also approve of :user_b
,
meaning that :user_b
is now also trusted within the system.
Consider the following node shape that consists of one node test and two property shapes:
:basicUserShape a sh:NodeShape ; sh:nodeKind sh:IRI ; sh:property [ sh:path :accesses ; sh:minCount 1 ] ; sh:property [ sh:path :creates ; sh:maxCount 0 ] .
The shape name is :basicUserShape
. It specifies three requirements for
a given focus node. First, the focus node should be an IRI, as opposed
to a blank node or a literal. The sh:property
keyword indicates that
the focus node must also adhere to a property shape. The first
property shape is about the value nodes reachable with the :accesses
property. Specifically, there must be at least one such value
node. The second property shape is about the value nodes reachable
with the :creates
property. In this case, there may be at most 0. In
natural language, this shape states: "The focus node is an IRI and
:accesses
at least one resource, but :creates
none." So, in our
example graph, only :user_a
and :user_b
satisfy :basicUserShape
.
SHACL has many different features for writing constraints. These
features are called constraint components. The previous example
made use of a value type constraint component, here indicated with
the sh:nodeKind keyword, to check whether the focus node is an IRI,
but also of cardinality constraint components to count the number
of value nodes, indicated with the sh:minCount
and sh:maxCount
keywords. SHACL has many constraint components and, for my
investigations, we restrict ourselves to the core constraint components.
Next follows some examples of some of the components
that illustrate the capabilities of SHACL.
Property Pair Constraint Components
In property shapes, we can compare the value nodes to another set of nodes (reachable by some other property) in some predefined ways. The most important are equality and disjointness checks between the two sets. Consider the following node shape:
:powerUserShape a sh:NodeShape ; sh:not [ a sh:PropertyShape ; sh:path :adds ; sh:disjoint :approves ] .
This shape asks of the focus node that the set of value nodes, i.e.,
the nodes reachable with the :adds
property, is not disjoint from the
set of nodes reachable with the :approves
property. In natural
language: "The focus node approves at least one user they also
added." So, in our graph :admin_user
and :user_b
satisfy :powerUserShape
.
Shape-based Constraint Components
We can refer to other shapes as well. Most notably, we can combine this with the counting from the cardinality constraint components to count only value nodes that conform to some other given shape. For example, consider the
:authorizedUserShape a sh:PropertyShape ; sh:path [ sh:zeroOrMore [ sh:inverseProperty :approves ] ] ; sh:qualifiedValueShape :isAdminShape ; sh:qualifiedMinCount 1 ; ] . :isAdminShape a sh:PropertyShape ; sh:path rdf:type ; sh:hasValue :Admin .
A first thing to note is that this shape makes use of a (complex) path
expression. SHACL supports path expressions similar to those of
SPARQL. This shape states that there must be at least one value node
reachable, by following inverse :approves
properties, that conforms to
:isAdminShape
. The keyword sh:qualifiedValueShape
is used to refer to
a shape, together with the sh:qualifiedMinCount
to denote the desired
cardinality. In our example graph, :admin_user
, :user_b
, and :user_c
satisfy :authorizedUserShape
. Note that it is different from writing:
:altAuthorizedUserShape a sh:PropertyShape ; sh:path [ sh:zeroOrMore [ sh:inverseProperty :approves ] ] ; sh:node :isAdminShape ; sh:minCount 1 ; ] .
where sh:node
is used to refer to another shape. This seems to express
the same shape, however, all constraints used in a property shape
apply to all value nodes, i.e., it is an implicit universal
quantifier. When a cardinality constraint is then used, it is separate
from the other constraints, and only refers the number of value
nodes. In words, the :altAuthorizedUserShape
states that all value
nodes conform to :isAdminShape
and there is at least one node in the
set of value nodes. In our example graph, only :admin_user
satisfies
:altAuthorizedUserShape
.
Closed Constraint Component
Finally, another interesting feature is closedness. It states that
only a select few properties are allowed for a focus node. These
select few properties can be given explicitly by the
sh:ignoredProperties
keyword, or are implied by the structure of the
shape: the set of non-blank nodes obtained by following the
sh:property
keyword, followed by the sh:path
keyword. Consider
:basicUserShape
defined above. We can expand the shape by adding the
following triples:
:basicUserShape sh:closed true ; sh:ignoredProperties ( rdf:type ) .
The shape is closed, meaning the only allowed properties are give by
sh:ignoredProperties
, i.e., rdf:type
, but we also allow the properties
related to the property shapes: :accesses
and :creates
. In natural
language: "The focus node has no other properties than rdf:type
,
:accesses
, or :creates
; and must have an :accesses
property and no
:creates
property." So in our example graph, only :user_a
now
satisfies :basicUserShape
.
Clearly, there are many intricacies in the semantics of SHACL that make it unsuited to study directly. Thereto, Chapter 2 of my thesis is dedicated to formalizing SHACL in such a way that makes it fit for the different investigations. There are many other constraint components than those demonstrated here. However, the ones discussed here are illustrative of the core SHACL features. Detailing the exact semantics of all core SHACL features is also discussed there.
Targeting
A set of shape definitions on its own does not allow us to validate a graph in SHACL. Every shape can have a target declaration associated with it. A target declaration is a query that determines all the focus nodes for a given shape. There are five types of target declarations allowed in SHACL. Most of them are parameterized with an IRI or blank node t:
- Node targets. There is one focus node, which is the parameter t (regardless whether t occurs in the graph.)
- Class-based targets. Targets all nodes that are of RDF class t.
- Implicit class targets. Targets all nodes whose RDF class is the shape name.
- Subjects-of targets. Targets all nodes that are the subject of a triple where the predicate is t.
- Objects-of targets. Targets all nodes that are the objects of a triple where the predicate is t.
We can add target declarations to the shapes from the previous section to make a complete shapes graph. Adding the Subjects-of target triple
:authorizedUserShape :targetSubjectsOf :approves
to the shapes graph gives us a complete constraint on the graph. In
this case, our example data graph conforms because all nodes that
approve another node are indeed authorized users. However, if we add
the triple :user_a :approves :user_d
to the data graph, it would no
longer conform, as :user_a
does not satisfy :authorizedUserShape
.