| ||||||||
|
|
||||||||
![]() | Table of Contents | |||||||
|
|
||||||||
![]() | 1. Introduction | |||||||
|
RQL is a query language for RDF and RDF Schema, loosely based on the syntax of OQL. The idea is that if we look at one or more RDF models and schemas, we can think of them as forming a set of connected graphs. RQL offers features for navigating through that graph and selecting specific edges and nodes for retrieval. A powerful characteristic of RQL is that it addresses RDF Schema semantics in the language itself. Class-instance relationships, class/property subsumption, domain/range and such are all addressed and inferred by specific language constructs. All in all, RQL is a very powerful and versatile language. But as with any powerful language, it takes some time to master it. This tutorial is meant as an aid to beginning RQL users. We will start by introducing some basic queries, and continue adding more advanced features by example. We do assume that the reader has a fair degree of familiarity with both RDF and RDF Schema.
The original specification of RQL, which includes a formal
definition of its semantics, can be found at the website of
ICS-FORTH, at http://139.91.183.30:9090/RDF/RQL/.
A user manual for the ICS-FORTH version of RQL is also supplied
there. This tutorial is not meant as an exhaustive guide to all possible query constructions in RQL, but rather as a "get started quickly" guide. For the techno-savvy reader, the complete language as supported by Sesame is described in the BNF grammar, which can be found in the appendix. | ||||||||
|
|
||||||||
![]() | 2. The Museum Repository | |||||||
|
In this tutorial, we will illustrate the use of RQL by means of queries on an example repository that contains RDF data and schema information about art and musea (taken and adapted from [Karvounarakis et al., 2000]). With each example query, a link is provided that executes that particular query on the online Museum repository. If you are reading this tutorial online, you can use this to get an idea of the kinds of results an RQL query gives. Although the specified output format of an RQL query is RDF, the query results of these examples are presented in an HTML table, where each column represents the instances of one RQL variable. In the figure below, an overview of the schema of this repository is given.
The Museum example repository is available online at OpenRDF.org, in the demonstration section. You can go there if you want to experiment with formulating your own queries. | ||||||||
|
|
||||||||
![]() | 3. The Basics | |||||||
|
3.1. The select-from-where construction An RQL query is typically built up from three clauses, which you might recognize from SQL: select, from and where. Their usage is slightly different from SQL though. Consider the following query:
select X, @P
from {X} @P {Y}
where Y like "Pablo"
The select allows you to specify a projection over your query results, i.e. which variables are returned in the result, and in what order. In the above query, we are interested in the variables X and @P, but not in the variable Y. It is also possible to specify you want all variables that are bound in the query to be returned, by using the asterisk *. However, when you use the asterisk, this must be the only argument in the select clause, and the order in which variables are returned can not be specified. The from is where the good stuff happens. Here, you bind variables to specific locations in the RDF model graph by specifying path expressions (see the next sections). In this example, X and Y are bound to nodes in the graph, while @P is bound to a connecting edge (the @ is a variable prefix that signifies the variable is only bound to properties). Hence, this structure corresponds to a statement where X is the subject, @P the predicate, and Y the object. The where is optional and can be used to constrain the values of variables bound in the from clause. In the example, we only want those values back where the value for Y is equal to the string "Pablo". This corresponds to selecting all statements where "Pablo" is the object. In RDF, nodes and edges are identified by means of their Universal Resource Identifier, or URI. Such identifiers can be quite long, making queries hard to read. That is why RQL has a namespace abbreviation mechanism (quite similar to the mechanism used in XML).
We specify the namespace abbreviations by means of an extra clause
at the end of the query: using namespace. In this
clause we specify a prefix, and the URI to which it corresponds,
for example: Now, whenever we use a property or resource from this namespace, for example the property paints, we can simply type cult:paints instead of the full URI. See the queries in the next section for some practical examples.
Variable names in RQL can be any identifier (with the exception of reserved RQL keywords like select, domain, Class, Property, etc.) starting with a letter followed by a string of alfanumeric characters (the characters "-" and "_" are also allowed). In the above examples we simply used X and Y, but it is possible (and often a good idea) to use more descriptive names. By convention, RQL variable names are spelled entirely in capitals, but this is not strictly necessary. Variables bound to properties are prefixed with an @, and variables bound to schema classes are prefixed with a $ (more about this in the section about querying the Schema). Path expressions in RQL are, as the name implies, expressions that allow you to specify a linear path through the graph model. In RQL, path expressions appear in the from clause and can be used to bind variables. An example of a path expression is seen in the previous query, where a path running from subject X to object Y through connecting predicate @P is specified. In a path expression, you can also specify the URI of a specific RDF property instead of using a variable. You can see an example of this in the next query, which returns all resources that have a "paints" property, and their value:
select PAINTER, PAINTING
from {PAINTER} cult:paints {PAINTING}
using namespace
cult = http://www.icom.com/schema.rdf#
Path expressions can be abitrarily long, simply by linking up. For example in this query, which selects, additionally to the painter and his painting, also the technique with which the painting was created:
select PAINTER, PAINTING, TECH
from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH}
using namespace
cult = http://www.icom.com/schema.rdf#
Notice the dot connecting the second graph edge with the object node of the first edge. In some path expressions, you are only interested in some of the nodes that the path traverses, and not in others. For example, in the above query you might decide that whatever the painting is (i.e whatever values PAINTING takes), you are only interested in the painter and the technique he/she uses (i.e. the values for PAINTER and TECH). In such cases, it is possibly to omit the variable altogether from the expression, as in this query:
select PAINTER, TECH
from {PAINTER} cult:paints . cult:technique {TECH}
using namespace
cult = http://www.icom.com/schema.rdf#
Since the path expressions in RQL are linear, it is not possible to specify two paths coming from the same starting point in one path expression. The same result can still be achieved however by using two or more path expressions, seperated by commas, and sharing variables between them. For example, if we want to have the last name of the painter returned with every result:
select PAINTER, PAINTING, LNAME
from {PAINTER} cult:paints {PAINTING},
{PAINTER} cult:last_name {LNAME}
using namespace
cult = http://www.icom.com/schema.rdf#
3.5.1. Retrieving the class of a resource We have seen how we can query the RDF model, but sofar we have not really used the schema in our queries. Actually we have used it implicitly in naming our variables: the only reason we know for sure that the variable PAINTER will return painters is that we know the schema in advance, where we can see that the domain of the property paints is the class Painter. Of course, this is not always possible or desirable, and instead of just assuming, you might want to be able to query to what class a certain resource belongs. We can do this by a small extension of the path expression (since we don't want to assume anything anymore, we are going back to using X and Y for variable names again):
select X, $X, Y
from {X : $X} cult:paints {Y}
using namespace
cult = http://www.icom.com/schema.rdf#
By adding an additional variable inside the curly brackets that specify the node to match, we can retrieve the class to which a resource belongs: the variable $X is matched to the class of the resource value of X. The prefix $ denotes that $X only matches class resources.
It is important to note that this mechanism is sensitive to RDF Schema semantics: the class variable used here will only retrieve the most specific class of the resource. The importance of this becomes clear when we look at an alternative query to retrieve the class of X, which uses no schema semantics, but instead just queries the rdf:type relation directly (this is comparable to what most RDF-only query languages do):
select X, Z, Y
from {X} rdf:type {Z}, {X} cult:paints {Y}
using namespace
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns# ,
cult = http://www.icom.com/schema.rdf#
In this query, we retrieve the class of X by specifying a path expression which traverses the type relation, in other words, we simply query all statements which have rdf:type as their predicate (and which have a cult:paints property as well, to keep the result as close to the previous query as possible). However, the results obtained are very different: we receive every resource with which X's values have a type relation, not just the most specific one. Since Cubist is a subclass of Painter, and Painter of Artist, we receive all these classes back. Even worse, it is impossible to tell from the result which class is the most specific class for a particular instance. Notice that this result is actually dependent on the implementation of your repository. In Sesame, the repository uses an extended interpretation, which means that all statements which can be inferred (following the RDF Schema closure rules mentioned in the RDF Model Theory) from the explicitly stated ones are considered part of the represented model just as much as the explicit statements. In a system where this interpretation is not used, the above query would give different results. The important point, however, is that by using the schema semantics in the RQL query, we have much more control over what is returned, and we are not left at the mercy of how the repository interprets the model. 3.5.2. Constraining resources to a class If we change the previous query slightly, we can constrain X to return only certain resources:
select X, Y
from {X : cult:Cubist } cult:paints {Y}
using namespace
cult = http://www.icom.com/schema.rdf#
As you can see, this time only those values for X are retrieved which are of type Cubist.
The above query does not help, when all we are interested in is the resource itself, and not the values of any property it has (such as paints). Of course, you could just omit the variable Y from the select clause, but you would still get duplicate results for X, whenever a Cubist painter is retrieved that painted more than one painting. A slightly different construction should be used to retrieve only the values of X:
select X
from cult:Cubist {X}
using namespace
cult = http://www.icom.com/schema.rdf#
At first glance this may seem like a normal path expression again, but it is important to note that in this case, the part in front of the curly brackets is not an edge in the graph. This is different from what we have seen sofar, and, truth to tell, it might be somewhat confusing at first glance. The best way to think of this construction is probably as a path expression extension that limits the possible starting points of the matching path in the graph. 3.5.3. Querying domains and ranges An RDF property can be constrained to be only applicable to certain types of resources, and to have only certain types of values. These constraints are the domain and range of the property, respectively, and they are part of the schema. RQL offers several ways of querying domains and ranges. For example, if we were not interested in what exact things one can paint, but more in what type of things can be painted, and to what type of resource the property is applicable, we might reformulate our query like this:
select $X, $Y
from {: $X} cult:paints {: $Y}
using namespace
cult = http://www.icom.com/schema.rdf#
Because we have ommitted the resource variables that match specific instances, instead all applicable classes are returned by $X and $Y returns which type of value can be associated with each of these classes. The result of this query is independent on whether there actually are any instances that have this property, it is purely based on what is possible according to the schema. A more direct way of retrieving the domain/range of a property is by using the domain() and range() functions. These functions can not be part of the path expression, but can be used in either the where clause, where you can use comparison operators to constrain its values, or it can be used in the select clause, as shown in this query:
select domain(@P), @P, range(@P)
from {} @P {}
where @P = cult:paints
using namespace
cult = http://www.icom.com/schema.rdf#
Here you can see that domain() and range() do not return all legal classes for the domain and range, but only the actual domain/range specification as given in the schema. RQL knows two basic types of operators: comparison operators and logical operators. Comparison operators are binary operators that compare values of their operands and return true or false according to the outcome of the comparison. In the previous queries you have already seen some of these operators being used. The comparison operators in Sesame are overloaded: their behaviour is different depending on whether their arguments are resources, classes or numerical values. For example, take a look at the comparison X < Y. When we use this operator to compare two string values, the operator performs a lexical comparison of X and Y. However, when X and Y are both numerical values, a numerical comparison is performed. If X and Y are RDF classes or properties, the operater checks if X is a subclass/subproperty of . Finally, if X and Y are sets instead of single-valued, set comparison is performed (in this case, whether X is a real subset of Y). It is important to note that RQL makes use of data types here which do, strictly speaking, not exist in the RDF model. The typing of values is done at query evaluation time, using a greedy coercion scheme. After determining that both operands are either sets of single-valued (an error is thrown if they are not) the operator attempts to coerce both operands to the same datatype, beginning with classes, then properties, then real numbers, integers, finally literals or resources.
Sesame RQL supports the following comparison operators, which are all overloaded in the same way as described above:
A special comparison operator can be used for substring comparison: like. The second argument of this operator must always be a string. The string may be prefixed or affixed with a wildcard *, to indicate which part of the substring match is left free. Finally, a comparison operator is provided for checking set membership: in. The first argument should be a single value, the second argument of this operator can be any resource query that returns a set of values, the operator determines whether the first argument is present in the set returned by the second. An example of its use is given in the following query:
select X, $X, Y
from {X : $X} cult:last_name {Y}
where X in
select K
from {K} cult:paints {L}
where L like "*guernica*"
using namespace
cult = http://www.icom.com/schema.rdf#
Logical operators are operators that do a logical combination of the truth values of their two operands. They can be used in the where to combine several comparison operators. Available logical operators are and, or and not. By making combinations with these operators, we can express very powerful constraints in a query. As an example consider the following query, which retrieves all Painters with a last name that starts with "P" and all Sculptors with a last name that does not start with "B":
select X, $X, Y
from {X : $X} cult:last_name {Y}
where ($X <= cult:Painter and Y like "P*")
or ($X <= cult:Sculptor and not Y like "B*")
using namespace
cult = http://www.icom.com/schema.rdf#
RQL offers several standard functions for retrieving standard RDFS relationships. These functions can be used in a select-from-where query, but they can also be used as stand-alone queries in their own right. Some of these, like domain() and range(), you have already seen in action. In this section, we will introduce a few more to show how they can be used. Notice that for most functions that retrieve classes (Class, subClassOf) equivalent functions for retrieving properties (Property, subPropertyOf) exist. A full listing of available functions can be found in the BNF grammar. The Class function retrieves all known classes. As a standalone query it can be used without any variable bindings: Class Inside a select-from-where query it can be used to bind a variable to the set of all classes:
select $X
from Class {$X}
where $X like "*r"
using namespace
cult = http://www.icom.com/schema.rdf#
The subClassOf() function can be used to query the class hierarchy. As a stand-alone query, you can use it to retrieve all subclasses of a particular class:
subClassOf( http://www.icom.com/schema.rdf#Artist )
The subclass relation is interpreted as being reflexive, that is, every class is a subclass of itself. That is why in the result of the above query, the class Artist is also returned. To query for only the direct subclasses of a class, we can use the ^ suffix:
subClassOf^( http://www.icom.com/schema.rdf#Artist )
the subClassOf() function can also be used as part of a select-from where query, for example in the where clause. Since it returns a set, we can use the in operator to compare values. For example, in the following query we retrieve all direct subclasses of Painter and the properties which apply to them:
select $X, @P
from { : $X} @P
where $X in subClassOf^( cult:Painter )
using namespace
cult = http://www.icom.com/schema.rdf#
The typeOf() function allows you to retrieve the classes to which a particular resource belongs:
typeOf( http://www.european-history.com/picasso.html )
By using the ^ suffix, we can retrieve the most specific class to which the resource belongs:
typeOf^( http://www.european-history.com/picasso.html )
Similarly to other functions, typeOf() can be used as a stand-alone query or as part of a select-from-where query. | ||||||||
|
|
||||||||
![]() | 4. Advanced Stuff | |||||||
|
4.1. Set Operations: Union, Intersection and Difference RQL offers three set operations that allow you to combine query results in several ways: union, intersect and minus. A union query returns a result set that consists of all answers given by either operand (A or B). An intersect query returns a result set that consists off all answers that occur in both operands (A and B). a minus query gives the difference between the answers from the first and second operand (A - B). Let us look at how we can use union to express an outer join-like operation. For example, consider the scenario where you want to retrieve all first names and last names of painters. You are not sure whether every painter has a first name specified, but even if they don't, you still want to know the painter and at least his last name. A normal select-from-where query will not work, because you can only specify inner-joins with such a query (that is, only results will be given back where an instantiation can be found for every variable). However, with the help of a union construct, you can specify an outer-join, effectively making the FNAME variable optional:
(select X, LNAME, FNAME
from {X : $X} cult:first_name {FNAME},
{X} cult:last_name {LNAME}
where $X <= cult:Painter
)
union
(select X, LNAME, NULL
from {X : $X} cult:last_name {LNAME}
where $X <= cult:Painter
and not (X in select X
from {X} cult:first_name
)
)
using namespace
cult = http://www.icom.com/schema.rdf#
The first operand of the union is a select-from-where query that retrieves all Painters with a last and a first name. The second operand is a query that specifically selects only those Painters who do not have a first name. The NULL in the select clause is a placeholder: since the number of selected items in both select clauses must be equal, we use this placeholder as a replacement for the FNAME variable, in which we are not interested in this case. 4.2. The semantics of domain and range In the previous chapter, we have seen how we can query the domain and range of a property. At first glance, the mechanism seems straightforward, and from a query formulation point of view, it is. However, under the hood some interesting things happen that have to do with the semantics of rdfs:domain and rdfs:range. In Sesame, whenever a property has no domain or range specified, it is considered undefined and a query will simply return 0 results. An interesting case is when a property has more than one domain or range defined. The RDF Schema specification and the RDF Model Theory prescribe a conjunctive interpretation of such a set of domain statements, meaning that a resource is only instance of the domain/range if it is an instance of all the classes mentioned. For example: we introduce a property hasName and define its domain to be the classes Person and Painting. With conjunctive semantics, only resources that are instances of both Person and Painting can have the property hasName. In fact, the extended interpretation that Sesame uses for storage and retrieval makes sure that it can never occur that a resource uses the property without belonging to its domain classes. This has to do with the fact that the RDF Model Theory interprets the constraints for domain/range as inference rules, not restrictions: domain(p,A) & p(i,j) => type(i,A) These inferences rules are applied when data is uploaded to Sesame. This way it is ensured that every resource i which has a property p is a member a of the domain of p. As said before, this has little effect on querying itself, except that it is important to take into account that the domain and range functions return sets which are to be interpreted as intersections.
| ||||||||
|
|
||||||||
![]() | 5. References | |||||||
| ||||||||
|
|
||||||||
![]() | A. BNF Grammar for Sesame RQL | |||||||
ns_query ::= query ["using" "namespace" nsdeflist ]
nsdeflist ::= nsdef { "," nsdef }
nsdef ::= ns_name "=" uri
ns_name ::= (* a legal namespace name *)
query ::= ["^"] res_query
| bool_query
| set_query
res_query ::= "(" res_query ")"
| sfw_query
| "Class"
| "Property"
| "subClassOf" ["^"] "(" res_query ")"
| "superClassOf" ["^"] "(" res_query ")"
| "subPropertyOf" ["^"] "(" res_query ")"
| "superPropertyOf" ["^"] "(" res_query ")"
| "typeOf" ["^"] "(" res_query ")"
| "domain" "(" res_query ")"
| "range" "(" res_query ")"
| res_query "[" res_query "]"
| integer_literal
| real_literal
| quoted_string_literal
| uri
| var
bool_query ::= "(" bool_query ")"
| "true"
| "false"
| res_query "in" res_query
| res_query comp_op res_ query
| bool_query bool_op bool_query
| "not" bool_query
| "exists" var res_query ":" bool_query
| "forall" var res_query ":" bool_query
set_query ::= "(" sfw_query ")" set_op "(" sfw_query ")"
sfw_query ::= "select" projslist "from" rangeslist ["where" bool_query]
projslist ::= "*"
| query { "," query }
rangeslist ::= pathexpr { "," pathexpr }
pathexpr ::= pathexpr_head { pathexpr_tail }
pathexpr_head ::= res_query "{" from_to "}"
| "{" from_to "}" ["^"] res_query "{" from_to "}"
pathexpr_tail ::= ["^"] res_query "{" from_to "}"
from_to ::= [ res_query ] [ ":" res_ query ]
set_op ::= "union" | "intersect" | "minus"
comp_op ::= "<" | "<=" | ">" | ">=" | "=" | "!=" | "like"
bool_op ::= "and" | "or"
var ::= datavar
| classvar
| propertyvar
datavar ::= identifier
classvar ::= "$" identifier
propertyvar ::= "@" identifier
uri ::= (* a legal uri - see rfc 2396 *)
identifier ::= ( letter | "_" ) { letter | digit | "_" | "-" }
integer_literal ::= [ "+" | "-" ] digit { digit }
real_literal ::= [ "+" | "-" ] { digit } "." digit { digit }
string_literal ::= '"' { char } '"'
| ||||||||
|
Last modified: 2004/02/10
copyright © 1997-2004 Aduna | ||||||||