Recursion for SHACL
Note: This post was used to write Section 1.3 of my PhD thesis, and serves as a general intro to our paper on the subject. The purpose of publishing it as a standalone blogpost is to further disseminate my writings. It is part of a short series of posts representing the Introduction of my thesis:
- Section 1.1: SHACL in a nutshell
- Section 1.2: Expressiveness
- Section 1.3: Recursion (this post)
- Section 1.4: Provenance
The W3C recommendation that defines SHACL explicitly leaves recursion undefined. In this post, I want to discuss this topic in light of existing academic work:
- Semantics and Validation of Recursive SHACL
- Stable Model Semantics for Recursive SHACL
- our own work Fixpoint Semantics for Recursive SHACL
- Well-Founded Semantics for Recursive SHACL
The goal of this post is to clarify what it means to add recursion to SHACL, and to illustrate the ideas presented in these papers. I will discuss the different semantics without defining them first. The different semantics are: Supported Model semantics, Stratified Recursion, Stable Model semantics, and Well-Founded semantics.
Setting the stage
This post will use a running example to illustrate the different semantics. Consider an access control setting, where users can be authorized to access resources or not. The RDF data will represent an overview of an access control setting where there are users and resources. Users can authorize other users to access files. There is a distinguished user, the administrator, who is authorized by default.
A SHACL shape graph describes what is a valid access control setting, i.e., if the SHACL shapes graph validates, it is a valid access control setting.
In this post, we'll use the standard SHACL prefix, sh:
, and
introduce our own local prefix, @prefix :
<https://mjakubowski.info/example/>.
Positive recursion
A shape is said to be recursive if it references itself (directly, or indirectly). Consider the following access control shapes graph:
:Authorized a sh:NodeShape ; sh:or ( [ a sh:NodeShape ; sh:hasValue :admin ] [ a sh:PropertyShape ; sh:path [ sh:inversePath :approves ] ; sh:qualifiedValueShape :Authorized ; sh:qualifiedMinCount 1 ] ) . :Authorized sh:targetSubjectsOf :accesses .
In natural language, the :admin
node is :Authorized
and every node
that is approved by an :Authorized
node is also :Authorized
. We
target all nodes that access something. In essence, this shapes
graph validates when all nodes that access a resource are
:Authorized
.
There are two things to note about this shape. First, it is clearly
recursive, as it mentions itself through the
sh:qualifiedValueShape
. Second, it mentions itself positively, that
is, the recursion does not happen through some kind of negation.
When talking about recursion, negation adds an extra layer of complexity to the discussion. Almost all the differences between the proposed semantics are due to negation. We'll discuss this further later in this post.
Semantics for positive recursion
Consider the following data graph:
:admin :approves :user_a . :user_b :approves :user_c . :user_c :approves :user_b .
Let us focus first on deciding which users could actually be
assigned the :Authorized
shape, forgetting for a moment the
targeting specified in our shapes graph. Which of the users could be
said to be :Authorized
? That is the question we need to be able to
answer when we have defined recursive semantics for SHACL.
Stratified recursion, Well-Founded, and Stable Model semantics all
agree that the set of :Authorized
nodes needs to be only :admin
and
:user_a
. Supported Model semantics, however, states that there is an
additional possible set of :Authorized
nodes: all of them!
The idea behind Supported Model semantics is that there is a
possible assignment of shape names to nodes, such that all assigned
nodes satisfy the shape. In this case, when both :user_b
and :user_c
are assigned to be :Authorized
, then they also both satisfy the
:Authorized
shape. Indeed, :user_b
is authorized by an authorized
user (:user_c
) and similarly for :user_b
. Clearly, this does not
match our intention, since this would allow every user to authorize
themselves!
For the other semantics, the idea in this case is to have a minimal assignment of shape names to nodes. This is commonly referred to as the minimal model semantics in logic programming literature.
To complete our discussion on positive recursion, we will talk about targeting for a moment. Consider the following addition to the data graph:
:user_b :accesses :resource1 .
It is easy to verify that this data graph does not validate under
Stratified, Stable Model, and Well-Founded semantics. We can see
this without knowing the definition, because none of these semantics
ever assign :Authorized
to :user_b
.
The Supported Model semantics has two possible assignments, but
which one do we look at? There are two ways to look at this,
according to the literature. First, brave validation states that we
will just require there to be just one assignment that assigns
:user_b
to be :Authorized
. The other, cautious validation states
that we require all possible assignments to assign :user_b
to be
:Authorized
. Thus, to make this concrete, if we use Supported Model
semantics with cautious validation it will agree with the other
semantics and the data graph will not validate. Otherwise, with
brave validation, Supported Model semantics dictates that the data
graph validates.
Adding negation
We will add some negation. Consider the following addition to the shapes graph:
:PotentialThreat a sh:NodeShape ; sh:property [ sh:path :accesses ; sh:minCount 1 ] ; sh:not :Authorized . :PotentialThreat sh:targetClass :BlacklistedUser .
In natural language, this shape states that a node is a
:PotentialThreat
if it is not :Authorized
and accesses a
resource. All nodes that access a resource and are not :Authorized
are a :PotentialThreat
. We want to ensure that all users that are a
:BlacklistedUser
are a :PotentialThreat
.
Considering the following data graph:
:admin :approves :user_a . :user_b :approves :user_c . :user_c :approves :user_b . :user_b :accesses :resource1 . :user_c :accesses :resource1 . :user_b a :BlacklistedUser .
which users would be assigned to be a :PotentialThreat
, and which
ones would be assigned to be :Authorized
? The idea of minimal
assignments is not going to help us here. All users will be assigned
a label. They are either :Authorized
or a :PotentialThreat
.
Of course, the intuition is that :user_b
and :user_c
do not have any
bussiness being :Authorized
. They are only so in a self-supporting
way. So, we would likely want them to be a :PotentialThreat
.
One idea is to first resolve the positive recursion part of the
shapes graph. So we would apply minimal models on :Authorized
and
afterward try to assign :PotentialThreat
. This is exactly the idea
behind Stratified Recursion. This only works when we look at the
shapes graph in "layers" where we first resolve the positive
recursion and work our way up. That is also where the name comes
from "stratified" is an archaic way of saying "layered". Again,
Stable Model and Well-Founded semantics agree with this
approach. Only Supported Model semantics, like before, considers two
possible assignments. The one where all users are :Authorized
and
the one where :user_b
and :user_c
are a :PotentialThreat
.
An interesting remark here is that with Supported Model semantics
neither brave nor cautious validation will always result in a
satisfying validation result. Indeed, with cautious validation
:user_b
will not be marked a :PotentialThreat
because there is an
assignment where :user_b
is :Authorized
. With brave validation, we
can do whatever we want. If :user_b
is not blacklisted, it would be
fine, because there is an assignment where :user_b
is :Authorized
!
In my personal view, this excludes Supported Model semantics as a
suitable semantics for recursive SHACL.
We can not always stratify easily
Indeed, we cannot always stratify easily as discussed above. Consider a new shapes graph:
:Authorized a sh:NodeShape ; sh:or ( [ a sh:NodeShape ; sh:hasValue :admin ] [ sh:and ( [ a sh:PropertyShape ; sh:path [ sh:inversePath :approves ] ; sh:qualifiedValueShape :Authorized ; sh:qualifiedMinCount 1 ] [ sh:not [ a sh:PropertyShape ; sh:path [ sh:inversePath :reports ] ; sh:qualifiedValueShape :Authorized ; sh:qualifiedMinCount 1 ] ] ) ] ) . :Authorized sh:targetSubjectsOf :accesses .
this shape is very similar to the :Authorized
shape from before,
only now, you can only be :Authorized
if you are also never reported
by an :Authorized
user.
Note that this shape cannot be stratified as we did with the
:PotentialThreat
before. It directly references itself
negatively. We will next look at the following data graph scenario:
:admin :approves :user_a ; :approves :user_b . :user_a :approves :user_c . :user_c :approves :user_d .
By now, you should be thinking about the natural question: what
users should be :Authorized
here? Clearly, all of them. There is
nothing special going on as long as we do not have anyone reporting
anyone. We will add the following line:
:user_b :reports :user_c .
Now, because :user_c
has been reported, they are no longer
:Authorized
and neither is :user_d
, even though :user_d
has not been
explicitly reported. All this the semantics agree on. But what if we add:
:user_a :reports :user_b . :user_b :reports :user_a .
to the data graph?
The Stable Model semantics will tell you there are two possible
assignments: one where :user_a
is :Authorized
and :user_b
is not,
and one where the opposite is true. Again, to know how to choose
between these two assignments, we can apply the ideas of cautions
and brave validation.
The Well-Founded semantics will take a different approach. Unlike
all the semantics discussed before, it will not just assign shapes
to nodes. In the Well-Founded semantics, assignments are
three-valued. Assignments are true, false, or unknown. In this
example, it will say the assignment of :Authorized
for :user_a
and
:user_b
are unknown.
Concluding thoughts
The ideas can be summarized as follows. First, Stratified Recursion is the only approach that somehow restricts the kinds of recursion you can write. The other approaches are also nicely defined on all possible shapes graphs. Second, Stable Model and Supported Model semantics consider multiple possible assignments. In these two semantics, it is useful to then distinguish cautious and brave validation. Third, the Well-Founded semantics considers three-valued assignments, but only one assignment is defined for a shapes graph, disregarding the need for the cautious and brave distinction. Finally, all semantics, except for Supported Model semantics, agree if we only consider shape graphs with only positive recursion — even on Stratified shape graphs!
In essence, the purpose of this post is to clear up the different ideas for semantics for recursive SHACL. They are all based on research in logic programming. There, they have been studies and evaluated to really correspond to peoples' intuitions. Therefore, when we apply them to SHACL, we get well-defined recursive semantics.
If I would suggest anything, I would say the proper recursive semantics for SHACL should be the Well-Founded semantics. Shapes are definitions and the Well-Founded semantics are very well suited for this. We defined the Well-Founded semantics for SHACL (and also by extension most other semantics coming from logic programming, like the ones discussed in this post) in our work Fixpoint Semantics for Recursive SHACL.
Thanks to Bart Bogaerts for the corrections and inspiring suggestions.