Introduction to SPARQL

SPARQL was proposed as a standard by the World Wide Web Consortium (W3C) in November 2008. It is maintained and developed by the W3C SPARQL Working Group, who in November 2012 recommended an upgraded version SPARQL 1.1 with new features including an update language (allowing users to change as well as consult RDF datasets). The latest recommendation can be found at these two sites, one for the query language and one for update:

  http://www.w3.org/TR/sparql11-query
  http://www.w3.org/TR/sparql11-update

Along with RDF and OWL, SPARQL is one of the three core standards of the Semantic Web. Its location in the Semantic Web "stack of languages" is shown in Figure 1. One point to note in the figure is that SPARQL does not depend on RDFS and OWL. However, as will be shown later in the chapter, knowledge encoded in RDFS and OWL may enhance the power of querying.

Web Generations

Figure 1: SPARQL in Semantic Web stack

SPARQL, as a database query language, resembles the well-known Structured Query Language (SQL). The syntax of SPARQL is shaped by the fact that it operates over graph data represented as RDF triples, as opposed to SQL's tabular data organised in a relational database.

The essence of querying is shown by the following illustration, using for the time being English rather than RDF. Imagine an RDF dataset with statements containing the following information:

The Beatles made the album "Help".

The Beatles made the album "Abbey Road".
The Beatles made the album "Let it be".
The Beatles includes band-member Paul McCartney.
Wings made the album "Band on the run".
Wings made the album "London Town".
Wings includes band-member Paul McCartney.

The Rolling Stones made the album "Hot Rocks".

One can imagine various queries that a music portal might need to run over such a dataset. For instance, the portal might construct web pages on demand for any album or group nominated by the user. This would require retrieval of information from the dataset for questions such as the following:

Who made the album "Help"?
Which albums did the Beatles make?

These are so-called WH-questions ("who", "what", "where", etc.), for which the first would receive a single answer ("The Beatles"), and the second a list of three answers ("Help", "Abbey Road", "Let it be"). The SPARQL counterparts to these questions use RDF triples that contain variables; these correspond to the WH-words in the English queries. The general form for such questions (still working in English) is as follows:

Give me all values of X such that X made the album "Help".
Give me all values of X such that the Beatles made X.

We can go further than this by introducing more than one variable, thus generalising the query:

Give me all values of X and Y such that X made Y.

This is like asking a question with two WH-words, such as "Which bands made which albums?". The answer is not a list of values, as before, but a list of X-Y pairs that could be conveniently presented in a table:

X Y
The Beatles "Help"
The Beatles "Abbey Road"
The Beatles "Let it be"
Wings "Band on the run"
Wings "London Town"
The Rolling Stones "Hot Rocks"

In all these examples, the question is represented by a single statement with one or more variables; however, we can also construct more complex queries containing several statements:

Give me all values of X and Y such that: (a) X made Y, and (b) X includes band member Paul McCartney.

The answer would be the first five pairs from the previous answer, excluding "Hot Rocks" since the dataset does not list Paul McCartney as a band member of the Rolling Stones.

Moving now from English to SPARQL, here is the encoding for the simple query "Which albums did the Beatles make?" for the MusicBrainz dataset. For now don't worry about learning the exact syntax; the important thing is to understand what the various bits and pieces are doing.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?album_title  
WHERE {
  ?band foaf:name "The Beatles" ; foaf:made ?album .
  ?album a mo:SignalGroup ; dc:title ?album_title .
}

The query begins with PREFIX statements that define abbreviations for namespaces. The query proper begins in the line starting SELECT, which also contains a variable (corresponding to X and Y in our English examples) starting with the question mark character '?'. Choose any word you like for the rest of the variable name, provided that you use it consistently. The remainder of the query, starting WHERE, contains a list of RDF triple patterns. These are like RDF triples except that they include variables. They are expressed in Turtle, which we introduced in Chapter 1.

The WHERE clause in the example has two RDF triple patterns, separated by a full stop. The first pattern matches resources made by the Beatles; the second requires that these resources belong to a class mo:SignalGroup (this rather weird name distinguishes albums, which are "signal groups", from their constituent tracks, which are also encoded as resources made by the Beatles).

The response to a query is computed by a process known as graph matching.

SPARQL terminology[edit | edit source]

Before proceeding to the detailed structure of queries, it is worth pausing to review the concepts introduced so far:

RDF triple
An RDF triple is a statement of the form subject-predicate-object expressed in one of the RDF formalisms.
RDF triple pattern
An RDF triple pattern is the same as an RDF triple except that any or all of its three constituents may be replaced by a variable.
RDF graph
An RDF graph is a set of RDF triples. You probably know that "graph" in mathematics has two distinct meanings: (1) a diagram showing points arranged by their relationship to an X axis and a Y axis; (2) a set of vertices (or nodes) linked by edges (or arcs). In the case of RDF the second meaning applies, where the subject and object in a triple are vertices, and the predicate is an edge that links them by pointing from subject to object. Formally, an RDF graph can be described as a directed labelled multigraph, which means (a) that edges are directional (you cannot switch subject and object without changing the statement), (b) that edges are named (by the predicate identifier), and (c) that there can be multiple edges linking two vertices (resources may be related in different ways).
RDF dataset
An RDF dataset is a set of RDF triples comprising a default RDF graph, which by definition is unnamed, and zero or more named RDF graphs. The idea behind this segmentation is that SPARQL queries can be explicitly confined to a named subset rather than running over the whole dataset.
Graph pattern
We use this term to refer to a conjunction of RDF triple patterns. It is therefore the same as an RDF graph, except that its constituents are RDF triple patterns (which contain variables) as opposed to normal RDF triples (which don't). Note that in a query, the expression following the keyword WHERE is a graph pattern; this is why graph patterns are important.
SPARQL Protocol client
A SPARQL Protocol client is an HTTP client that sends requests for SPARQL Protocol operations. As you probably know, "client" here refers to a program that sends a request to another program, possibly running on another computer, over a network; the other computer is known as the "server".
SPARQL Protocol service
A SPARQL Protocol service is an HTTP server that services requests for SPARQL Protocol operations.
SPARQL endpoint
A SPARQL endpoint is a SPARQL Protocol service, identified by a given URL, which listens for requests from SPARQL clients.

Querying with SPARQL[edit | edit source]

Submitting a query[edit | edit source]

In developing an application , you will need to build queries into your application code. There are APIs that help you to do this, like the ones provided by LinkedData.Center. However, before learning to use APIs, you can learn the syntax for queries (and their responses) by using an interactive SPARQL endpoint web application to enter the query by hand.

As an example of this procedure Figure shows a snapshot of the LinkedBrainz endpoint at http://sparql.linkedbrainz.org/ with a query about the albums made by the Beatles.

Web Generations

Figure 3: Typing a query into a SPARQL endpoint

If you type in this query and hit the "Run Query" button you will obtain a table giving values for the variable ?album_title:


album_title
"'69 Rehearsals"^^<http://www.w3.org/2001/XMLSchema#string>
"'Quote' Unquote: The Sixties Interviews"^^<http://www.w3.org/2001/XMLSchema#string>
"1"^^<http://www.w3.org/2001/XMLSchema#string>
"16 Superhits, Volume 1"^^<http://www.w3.org/2001/XMLSchema#string>
"16 Superhits, Volume 2"^^<http://www.w3.org/2001/XMLSchema#string>
"16 Superhits, Volume 3"^^<http://www.w3.org/2001/XMLSchema#string>
"16 Superhits, Volume 4"^^<http://www.w3.org/2001/XMLSchema#string>
"1962 Live Recordings"^^<http://www.w3.org/2001/XMLSchema#string>
"1962 Live at Star Club in Hamburg"^^<http://www.w3.org/2001/XMLSchema#string>
"1962-1970"^^<http://www.w3.org/2001/XMLSchema#string>

Please note that the clause LIMIT 10 in the SPARQL query limits ther results to the first ten solutions that match the "WHERE" clause portion.

It is always a good idea to limit the number of the results in a SPARQL query, because some time returned solution can be really a lot. ORDER BY, LIMIT and OFFSET constructs, like in SQL, help you in paginating results.

Types of query[edit | edit source]

SPARQL defines the following query types for data retrieval:

ASK
An ASK query is a test of whether there are any resources in the dataset matching the search pattern; the response is either true or false. Intuitively, it poses the question: "Are there any X, Y, etc. satisfying the following conditions ...?"
SELECT
A SELECT query returns a table in which columns represent variables and rows represent variable bindings matching the search pattern. Intuitively: "Return a table of all X, Y, etc. satisfying the following conditions ...".
CONSTRUCT
A CONSTRUCT query returns an RDF graph (i.e., set of triples) matching a template, using variable bindings obtained from the dataset using a search pattern. Intuitively: "Find all X, Y, etc. satisfying the following conditions ... and substitute them into the following template in order to generate (possibly new) RDF statements ...".
DESCRIBE
A DESCRIBE query returns an RDF graph, extracted from the dataset, which provides all available information about a resource (or resources). The resource may be identified by name, or by a variable accompanied by a graph pattern. Intuitively: "Find all statements in the dataset that provide information about the following resource(s) ... (identified by name or description)".

Queries using ASK[edit | edit source]

An ASK query corresponds intuitively to a Yes/No question in conversational language. For example, the following query corresponds to the Yes/No question "Is "All You Need Is Love" a song recorded by the Beatles?":

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

ASK
WHERE {
  ?band foaf:name "The Beatles" ; foaf:made [ 
     a mo:Track ; dc:title "All You Need Is Love"^^xsd:string 
  ].
}

If this query is submitted as described above, the answer given will be "true".

Dissecting the syntax of this query, we note the following:

  • The PREFIX statements are not an essential part of a query, but here as elsewhere they are useful as a means of abbreviating RDF triples or patterns. Be careful not to put a full stop at the end of a PREFIX statement, since this will cause a syntax error.
  • The query proper begins with ASK, which specifies the relevant query type.
  • WHERE introduces a graph pattern, which at its simplest is a conjunction of RDF triples or patterns, presented in curly brackets and separated by full stops (or by commas or semicolons, as mentioned in the section on Turtle in Chapter 1). The patterns may use abbreviations defined in the PREFIX statements, and may include one or more variables. (More complex graph patterns will be described later on.)
  • Layout is free provided that terms are separated by white space. For instance, if you wished you could type the whole query on one line, or at the other extreme type a new-line character after every term. The layout given above with new lines for the key words PREFIX, ASK, WHERE, is adopted only for human readability.
  • The keywords of SPARQL syntax – PREFIX, ASK, WHERE, etc. – are not case-sensitive, so if you prefer you can use prefix, ask, where, and so on. In the examples we consistently capitalise these words for reasons of readability, but this has no effect on how the query engine interprets the query.

If you want to ask whether there are any X, Y, etc. such that certain conditions hold – e.g., "Are there any X such that X was made by the Beatles" – you need to use RDF patterns, which are like triples except that they contain variables. These are represented by names beginning with a question mark '?', as in this example:

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

ASK
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?something } 

Queries using SELECT[edit | edit source]

To show the basic syntax of a SELECT query, let us return to the example given in section 2.3:

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?album
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?album .
        ?album a mo:SignalGroup
      } 

Note the following:

  • After the (optional) PREFIX statements, the query proper begins with SELECT, which specifies the query type.
  • After SELECT you list all the variables that you would like to see tabulated in the response. Variables should be separated by spaces if there are more than one. Alternatively, you can simply put an asterisk after SELECT, meaning that all variables should be tabulated. In the example, this would yield the same result.
  • As before, WHERE introduces a graph pattern including one or more variables.
  • As before, layout is free provided that terms are separated by white space.

Try running the query in the interactive linkedbrainz endpoint. You should obtain in response a list of URIs that use arbitrary codes rather than recognisable words.

Ordering the rows in the query result[edit | edit source]

For some queries you might want the results to be presented in a particular order. For instance, if your music portal retrieves albums made by the Beatles, using the query given above, you might want to present these in alphabetical order of title. This can be done using the keywords ORDER BY, as in the following example:

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT *
WHERE { ?band foaf:name "The Beatles" ;  foaf:made ?album .
        ?album a mo:SignalGroup .
        ?album dc:title ?title
      }
ORDER BY ?title

The key element in this query is the ORDER BY component at the end, which stipulates that rows should be presented in alphabetical order of the values in the ?title column. Note also one other change from the previous example, the use of the asterisk shorthand in the SELECT clause, which asks for all variables in the WHERE clause to be tabulated. This shorthand is not always used, since often the selected variables are a strict subset of those mentioned in the WHERE clause.

Try replacing ORDER BY ?title by ORDER BY DESC(?title), which will present the rows in descending rather than ascending order of title (i.e., ascending is the default).

What exactly do these tables show? The columns, as we have seen, correspond to variables in the query for which tabulation is requested after SELECT, either explicitly by name, or implicitly by the '*' option. The rows give all variable bindings that match the graph pattern in the WHERE clause. A binding is an assignment of identifiers or literals to the variables which, when instantiated in these patterns, will yield a subgraph of the dataset. Note that when computing these bindings, the query engine makes the key assumption that variables occurring in more than one pattern are bound to the same resource. In this way it avoids returning a row in which an album is paired with the title of a different album.

Returning results page by page[edit | edit source]

For some queries, including our example, the output table returns too many result at once. In such cases it would be useful if the music portal included a paging facility allowing users to view the information in manageable portions – perhaps ten rows at a time. This can be done through the query engine if you use the keywords LIMIT and OFFSET. To see how this works, trying extending our previous query as follows:

PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT *
WHERE { ?band foaf:name "The Beatles" ;  foaf:made ?album .
        ?album a mo:SignalGroup .
        ?album dc:title ?title
      }
ORDER BY ?title
LIMIT 10 OFFSET 0

You should get back the same table, but cut off after the first ten rows; the result will be the same if you just put LIMIT 10 leaving the offset unspecified. Now try raising the number after OFFSET to 10. You should get back the next ten-row segment of the table, covering rows 11-20. In general, if LIMIT is L and OFFSET is S, the query will return L rows starting at S+1 and continuing up to S+L.

Using tests to filter the results[edit | edit source]

We have seen queries in which the WHERE clause contains specific resources (the Beatles) or variables (?album). But what if we want to obtain results for any variable that satisfies a certain condition – e.g., albums beginning with the letter 'B', or band members born before 1960?

Such conditions are called "filters", and to illustrate them, let us switch to an example in which our aim is to retrieve tracks (not albums) with a duration between 300 and 400 seconds. Since in MusicBrainz durations are encoded in milliseconds, the relevant filter condition can be stated as follows:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?title ?duration
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?track .
        ?track a mo:Track .
        ?track dc:title ?title .
        ?track mo:duration ?duration .
        FILTER (?duration>=300000 && ?duration<=400000)
      }
ORDER BY ?duration 
LIMIT 10

The tables in Figures 5 and 6 show some other operators from which filter conditions can be constructed. Both tables are taken from the W3C specification at http://www.w3.org/TR/rdf-sparql-query/, which can be consulted for further details.

Conceptually, filters define a boolean condition on a graph pattern binding. We have already seen that the graph pattern in a WHERE clause has a set of solutions, each corresponding to a binding of the variables mentioned in the graph pattern. The filter submits these solutions to a boolean condition, letting through only those variable bindings for which the condition is met. In a naive implementation, the projection would be computed in just this way: first find the set of solutions matching the graph pattern; then select from this set the solutions that satisfy the filter condition, for inclusion in the final result.

Web Generations

Figure 5: Unary operators for filter expressions

Web Generations

Figure 6: Binary operators for filter expressions

Avoiding duplicate rows in the output table[edit | edit source]

For some queries you may find that the output table has duplicated rows. The reason for this is usually that the selected (tabulated) variables are a strict subset of the variables in the graph pattern. Consider for instance the following table , which you will see if you submit the previous query :


title duration
"All You Need Is Love"^^<http://www.w3.org/2001/XMLSchema#string> "300000"^^<http://www.w3.org/2001/XMLSchema#int>
"All You Need Is Love (live on "Our World" TV Show: 1967-06-25)"^^<http://www.w3.org/2001/XMLSchema#string> "300293"^^<http://www.w3.org/2001/XMLSchema#int>
"All You Need Is Love (live on "Our World" TV Show: 1967-06-25)"^^<http://www.w3.org/2001/XMLSchema#string> "300293"^^<http://www.w3.org/2001/XMLSchema#int>
"Misery"^^<http://www.w3.org/2001/XMLSchema#string> "300666"^^<http://www.w3.org/2001/XMLSchema#int>
"[improvisation 5]"^^<http://www.w3.org/2001/XMLSchema#string> "301080"^^<http://www.w3.org/2001/XMLSchema#int>
"Hey Jude"^^<http://www.w3.org/2001/XMLSchema#string> "302040"^^<http://www.w3.org/2001/XMLSchema#int>
"I Lost My Little Girl"^^<http://www.w3.org/2001/XMLSchema#string> "302666"^^<http://www.w3.org/2001/XMLSchema#int>
"Ride Rajbun"^^<http://www.w3.org/2001/XMLSchema#string> "302666"^^<http://www.w3.org/2001/XMLSchema#int>
"February 10, 1964"^^<http://www.w3.org/2001/XMLSchema#string> "302826"^^<http://www.w3.org/2001/XMLSchema#int>
"[Think for Yourself studio chatter]"^^<http://www.w3.org/2001/XMLSchema#string> "303000"^^<http://www.w3.org/2001/XMLSchema#int>

In the begin of this table we find two rows with track title "All You Need Is Love (live on "Our World" TV Show: 1967-06-25)" and duration 300293; and there are many more examples further down the table. This happens because there might be multiple resources instantiating ?track having the same values for ?title and ?duration. Here, for example, the track "All You Need Is Love (live on "Our World" TV Show: 1967-06-25)" is present in two different albums, so it shows up twice. If all three variables were tabulated, the rows would differ in the ?track column, but since this column is not requested, we obtain rows that appear identical. To avoid this you can include the keyword DISTINCT in the SELECT clause, as follows:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT DISTINCT ?title ?duration
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?track .
        ?track a mo:Track .
        ?track dc:title ?title .
        ?track mo:duration ?duration .
        FILTER (?duration>300000 && ?duration<400000)
      }
ORDER BY ?duration LIMIT 10

The output table, you should now find only one row pairing "All You Need Is Love (live on "Our World" TV Show: 1967-06-25)" with 300293.

Since DISTINCT is computationally expensive there is an efficient alternative REDUCED which eliminates some duplicates but not necessarily all (e.g., it fails to eliminate the duplication of "Within You Without You" mentioned above); however, DISTINCT is more widely used, and should not be computationally expensive when ORDER BY is also used.

Retrieving aggregate data for groups of bindings[edit | edit source]

Suppose that the dataset contains triples specifying the tracks on each album, and associating a duration in milliseconds with each track, represented by an integer literal. (In fact the MusicBrainz dataset associates records with albums, and tracks with records, which complicates the query slightly – see below.) Fully listed this is a lot of data, and you might wish instead to report, for each album, the total duration obtained by summing the durations of the tracks. This is an example of aggregate data, and it requires two operations: first, we must segment the variable bindings into groups, corresponding in this case to all bindings relating to a given album; second, for each group, we must submit the values of a specified variable to an aggegation function – in this case, sum the track durations.

For instance, suppose the dataset has just two albums matching the query, namely Revolver and Abbey Road; and suppose that for Revolver just three tracks are included, namely "Eleanor Rigby", "I'm only sleeping", and "Doctor Robert", with durations respectively of 200000, 240000, and 160000 milliseconds. (These are just round numbers made up for the example.) This will mean that for Revolver we have three bindings of the variables ?album, ?track, and ?track_duration; and if Abbey Road has four specified tracks, it will correspondingly have four bindings. We now separate the bindings into groups – three for Revolver, four for Abbey Road – and within each group we want to sum the track durations, and return only a table that gives albums along with their total durations. This will mean that the first row of the table will specify Revolver in the ?album column, and 600000 in the second column for which we need a new name – perhaps ?album_duration.

Here is a query that achieves this result. In addition it imposes a condition on the total duration of the album, reporting only albums with duration exceeding 3600000 milliseconds (i.e., one hour), selecting (and hence grouping by) album title rather than album.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

SELECT ?album_title (SUM(?track_duration) AS ?album_duration)
WHERE { ?album mo:record ?record .
        ?album dc:title ?album_title .
        ?record mo:track ?track .
        ?band foaf:name "The Beatles" ;  foaf:made ?track .
        ?track mo:duration ?track_duration .
       
      } 
GROUP BY ?album_title
HAVING  (SUM(?track_duration) > 3600000)
ORDER BY ?album_duration
LIMIT 10

Note two new keywords here: AS introducing a variable name for the sum of durations (this becomes the heading of the second column of the output table); and HAVING introducing a filter over the group of variable bindings that has just been specified by the GROUP BY component.

Queries using CONSTRUCT[edit | edit source]

When building an application, you might need to retrieve some information from a queried dataset and re-express it in new RDF triples, perhaps using new names for resources. This might, for example, allow more efficient integration with triples from another dataset.

To meet this need, SPARQL provides a CONSTRUCT query which uses information retrieved from a dataset in order to build new RDF statements. Note that the query does not update the dataset. The new RDF triples are returned to the user as output, to be used in any way desired, the dataset itself remaining unchanged. (In SPARQL 1.1 there are query types for updating a dataset, as described in a later section of this chapter.)

To show the basic syntax of a CONSTRUCT query, consider the following example where the user wishes to assert "creator" relationships between artists and their products – of any kind; perhaps the aim is to construct a dataset in which this predicate is used consistently, replacing more specific predicates in MusicBrainz.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

CONSTRUCT { ?album dc:creator ?band .
            ?track dc:creator ?band .
          }
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?album .
       ?album mo:record ?record .
       ?record mo:track ?track . 
      }

The key to understanding this query is that the variables employed in the CONSTRUCT list must occur also in the WHERE list. When the query is run, the query engine begins by retrieving the variable bindings satisfying the description in the WHERE list – just as it would for a SELECT query. For each variable binding, it then instantiates the triple patterns in the CONSTRUCT list and so creates (in this case two) new RDF triples. The result of the query is a merged graph including all the created triples.

Figure 8 shows this outcome diagrammatically for an even simpler CONSTRUCT query, with the relevant part of the dataset and the constructed triples both shown as graphs.

Web Generations

Figure 8: Result of a simple CONSTRUCT query

Ordering and limiting the constructed triples[edit | edit source]

When building new RDF triples with CONSTRUCT, you can use the operators described in section 2.4.4 in order to organise and delimit the variable bindings retrieved from the dataset. The following query will build "creator" relationships only for the first 10 albums recorded by the Beatles, following alphabetical order of the titles, along with their tracks:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

CONSTRUCT { ?album dc:creator ?band .
            ?track dc:creator ?band .
          }
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?album .
        ?album mo:record ?record .
        ?album dc:title ?album_title .
        ?record mo:track ?track . 
      }
ORDER BY ?album_title
LIMIT 10

Query patterns containing disjunction[edit | edit source]

Suppose that for some reason you want to construct triples for albums made either by the Beatles or by the Smashing Pumpkins (or both). Including both of these constraints in the WHERE list will not work, because implicitly the list represents a conjunction of statements, each of which must be satisfied. To allow disjunctions, SPARQL contains a UNION pattern; this is formed by placing the keyword UNION between two subsets of statements, each subset delimited by curly brackets. The meaning is that variable bindings should be retrieved if they satisfy either the statements on the left or the statements on the right (or both). Thus our target query is formed as follows:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

CONSTRUCT { ?album dc:creator ?band .
            ?track dc:creator ?band .
          }
WHERE { ?band foaf:made ?album .
        ?album mo:record ?record .
        ?record mo:track ?track .
        { ?band foaf:name "The Beatles" }
          UNION
        { ?band foaf:name "The Smashing Pumpkins" }
      }

Note that the first three statements of the WHERE list lie outside the scope of the UNION operator.

Retrieving resources for which information is MISSING from the dataset[edit | edit source]

In the examples we have seen so far, variable bindings must be retrieved for all patterns listed after WHERE. This means that if we retrieve several facts about an album (say), the album will only be included in the output if all these facts are presented in the dataset: if just one is missing, the others will be ignored. SPARQL deals with this problem by allowing any graph pattern in the list to be preceded by the keyword OPTIONAL. This means that when computing variable bindings, the query engine should accept incomplete bindings provided that the unspecified variables occur only in optional patterns.

In the following query, optional patterns are used ingeniously to select only variable bindings for which a particular variable is not bound. The variable in question records an artist's place of death, and it is assumed that if this information is missing from the dataset, the artist will still be alive. If variables in the CONSTRUCT clause are not bound in the OPTIONAL clause, the triple patterns with these variables are not generated. As a result, "creator" relationships are constructed only for artists who are alive (or more precisely, artists for whom there is no death place recorded in the dataset).

PREFIX dbont: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

CONSTRUCT { ?album dc:creator ?artist . }
WHERE { ?artist foaf:made ?album .
        OPTIONAL { ?artist dbont:deathPlace ?place_of_death }
        FILTER (!BOUND(?place_of_death))
      } LIMIT 10

Note that in the filter expression '!' denotes negation, so that the whole expression means that the variable is not bound.

You should take care using this kind of query, since it depends on a risky inference sometimes called the closed-world assumption – namely, that any relevant statement not found in the dataset must be false. Thus if the dataset contains information about places of death, but no statement giving the place of death of Paul McCartney, we infer by this assumption that Paul McCartney must still be alive, since otherwise his place of death would have been recorded.

Assigning variables[edit | edit source]

If you want to construct RDF triples using a variable that is derived from retrieved data, e.g., through an arithmetical operation, you can add a BIND statement to the WHERE clause as follows:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

CONSTRUCT { ?track mo:runtime ?secs } 
WHERE { ?band foaf:name "The Beatles" ; foaf:made ?album .
        ?album mo:record ?record .
        ?record mo:track ?track .
        ?track mo:duration ?duration .
        BIND ((?duration/1000) AS ?secs) .
      } LIMIT 10

In this way the object of mo:runtime will be given in seconds rather than milliseconds.

Constructing new triples using aggregate data[edit | edit source]

We have already discussed a SELECT query that returns aggregate data by summing the durations of tracks in each album. You may recall that such a query uses the AS keyword in the expression following SELECT, to introduce a variable name for the aggregate value – in this case, the album duration. In the context of a CONSTRUCT query we therefore have a problem: how to introduce this new variable for the aggregate?

The solution used in SPARQL is to allow a sub-query after the keyword WHERE, in place of the usual graph pattern. This is achieved by the following rather convoluted syntax:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mo: <http://purl.org/ontology/mo/>

CONSTRUCT { ?album mo:duration ?album_duration }
WHERE {
    SELECT ?album (SUM(?track_duration) AS ?album_duration)
    { ?band foaf:name "The Beatles" ; foaf:made ?album .
      ?album mo:record ?record .
      ?record mo:track ?track .
      ?track mo:duration ?track_duration . 
    } GROUP BY ?album 
      HAVING (SUM(?track_duration) > 3600000) 
} LIMIT 10

Queries using DESCRIBE[edit | edit source]

Like CONSTRUCT, DESCRIBE delivers as output an RDF graph – i.e., a set of RDF triples. It differs from CONSTRUCT in that these triples are not constructed according to a template, but returned as found in the dataset. The reasons for doing this are similar to those for CONSTRUCT – you might, for instance, want to add these triples to another dataset – but you would prefer DESCRIBE if you were satisfied with the original encoding and had no reason to re-express the information using different resource names.

Resources can be specified more generically as bindings to a variable. Thus the following query requests all triples that mention for an album.

PREFIX mo: <http://purl.org/ontology/mo/>

DESCRIBE ?track  WHERE { ?track a mo:Track  } LIMIT 1
 PreviousNext