W3C

The Semantics of SPARQL

Editors working draft.
Live Draft - version:
$Revision: 1.1 $ of $Date: 02 November 2005$
Editors:
Enrico Franconi, Free University of Bozen-Bolzano <>
Sergio Tessaris, Free University of Bozen-Bolzano
This version:
http://www.inf.unibz.it/krdb/w3c/sparql/
PDF document of this version (better):
PDF document.

Abstract

This document reflects the best effort of the editors to formalise the SPARQL semantics and incorporates input from various members of the WG, but is not yet endorsed by the WG as a whole.

Status of This document

This is a live document and is subject to change without notice. See also the change log.

See also: DAWG Working Group


Table of Contents


1 The Semantics of SPARQL

Definition 1 (RDF graphs)
We call RDFT the set of RDF terms, namely the union of all RDF URI references with the set of all literals in their canonical representation and with all the bnode names; the canonical representation of a literal is a chosen representative of all the literals associated to the same value, if the literal is non ill-typed, otherwise it is the literal itself. RDF graphs are defined in the standard way.

Definition 2 (Pattern Solution)
A pattern solution (aka variable substitution) is a partial function from a finite subset, possibly empty, of the query variables alphabet to RDFT; optionally, we may consider pattern solutions mapping to RDFT without bnodes.

Definition 3 (Answer Set)
An answer set is a set of pattern solutions. An answer set is non-redundant if it does not contain any redundant tuple.

A tuple S  (- AS is redundant in an answer set AS if L(AS) - the logical translation of the answer set - is logically equivalent to L(AS \ {S}) - the logical translation of the answer set diminished by the tuple.

The logic translation L(AS) of an answer set AS is a closed existential predicate logic formula, where URIs and literals (in their canonical representation) are constants and blank nodes are existential variables, and the body is a conjunction of atomic formulae of some predicate R in correspondence with each solution S (- AS. This correspondence is built as follows:

Definition 4 (RDF Merge)
The RDF merge  U+ <G1...Gn> of a sequence of graphs <G1...Gn> (i.e., a dataset) is the ordered merge union of the graphs, where repeated bnodes are substituted with fresh ones, by keeping the names of the bnodes coming first in the sequence order. Note that, w.r.t. the standard definition of RDF merge, if any of the graphs contains variables, then those are not renamed (i.e., variables are treated as URIs).

Definition 5 (Basic Graph Pattern)
A basic graph pattern is a finite RDF graph where query variables are allowed to label nodes and edges, in alternative to elements of RDFT. Variables may be annotated with a set of elements of RDFT- to restrict the possible assignments of variables.

Definition 6 (Answer Set of a BGP query)
Given an RDF graph G and a basic graph pattern query Q, the answer set (resp. non-redundant answer set) of Q for G is the maximum set (resp. maximum non-redundant set) of pattern solutions such that for each such S:

  1. the domain of S is restricted to the query variables in Q;
  2. the bnodes in the range of S can only be among those in G;
  3. if a variable is annotated with a set of elements of RDFT then S must map that variable to one of these elements;
  4. Q [S] - i.e., the application of the substitution S to Q - is a well-typed RDF graph;
  5. G |= (G   U+  Q) [S]
    where |= is a semantically well defined logical implication relation among RDF graphs, namely either simple entailment, or RDF entailment, or RDFS entailment, or OWL-DL entailment, or OWL-Full entailment.

Lemma 1  

  1. Both the answer set and the non-redundant answer set of a query for a graph are unique.
  2. An answer set without substitutions to bnodes is always non-redundant.
  3. In the case of simple, RDF, and RDFS entailment, the answer set of any query for a ground graph (i.e., a graph without bnodes) is always non-redundant.
  4. In the case of simple entailment, the answer set corresponds to all the substitutions defined as follows:
    • consider all the query variables in Q as fresh bnodes;
    • consider all subgraph isomorphisms between Q and G, where bnodes may be mapped to any node;
    • a substitution is the projection over the bnodes in Q that were query variables of the above isomorphisms.

Definition 7 (SPARQL Algebra - still informal)
A SPARQL query expression constructs an answer set from an RDF graph, by combining basic steps of the form of basic query patterns - constructing answer sets from the RDF graph - with higher level algebraic operators which transform answer sets. Algebraic operators are:

  1. Join: AS AS --> AS
  2. Filter: AS --> AS
  3. Optional: AS AS --> AS
  4. Union: AS AS --> AS
  5. Select: AS --> AS

Optionally, a Construct operator may at the end transform an answer set in an RDF graph.