http://www.inf.unibz.it/~rodriguez/DIG/bulk-data-extension.html
2007/07/15
| Diego Calvanese, Free University of Bolzano Bozen |
|
Mariano Rodriguez, Free
|
In several areas such as Data
Integration
[Lenzerini et al. 02], the Semantic Web [Heflin and Hendler, 01], and Ontology-Based Data Access [DL-LiteA], the intentional level of an application domain is
represented by an ontology that provides a conceptual view of the data
maintained by the system, and through which clients access the system services.
In a setting where the amounts of data involved are large, such as the ones
mentioned above, it may be more convenient, or even necessary due to
architectural constraints, to have the data itself managed by one or more
external systems (e.g., a Database Management System), and link the ontology to
such systems by means of suitable mappings
[DL-LiteA],[Poggi et al. 07].
Indeed, this way of proceeding offers several advantages with respect to the one
where both the intentional (i.e., the TBox) and the extensional (i.e., the ABox)
levels of an ontology are under the direct control of the ontology management
system (OMS) and the associated reasoner1:
i) when data management
can be fully delegated to the external system, then one can rely on such a
system to improve both efficiency in
processing queries and scalability
with the data; ii) physical independence,
since changing the structure and organization of the underlying data does not
require to change the ontology itself, but only the mappings between the
ontology and the data; iii) flexibility
and robustness, since temporary unavailability of a source does not
invalidate the whole query answering process; moreover, querying can be
immediately resumed as soon as the data source becomes available again.
We
observe that whether it is possible to exploit the above advantages and delegate
query processing to an external system, while preserving soundness and
completeness of the overall query answering process, strongly depends on the
expressiveness of the language used to express the ontology [Calvanese et al. 06].
Moreover, if the issue of client-system communication efficiency is considered,
asking the clients to retrieve and transfer data from existing data repositories
to the OMS might create a large communication overhead in which the client is
involved. One possibility to avoid this is to delegate data retrieval to the OMS
by telling it how and from where to retrieve the data.
The
communication mechanism which is shaping the way in which users and external
systems interact with OMSs, the DIG
Interface [DIG], currently does not foresee
these kinds of tasks even though there are systems already implementing these
ideas (e.g., KAON2,
QuOnto).
In
this article, we provide the definition of a standard for this kind of
client-system interaction for OMSs, in terms of an
extension to the DIG 2.0 interface. The extension allows for the specification
of linking axioms, where each such axiom
generically connects elements of an ontology to elements of an external data
source.
Following
[Lenzerini et al. 02], the elements to
connect are specified in the linking axioms by means of two queries, over the
data source and over the ontology, respectively. The extension does not take any
commitments with respect to the actual language(s) used in a linking axiom. The
linking axioms definitions are kept abstract, and are instantiated with specific
types of queries when needed. As an example and proposal, we also provide an
instantiation of the extension that allows to map a
generic DL ontology to relational database(s).
1. Note that current Description Logic reasoners might be considered as the simplest form of OMSs with associated reasoner.
In the following
subsections, we describe the extension by first introducing the general
framework and the extension to the DIG messages.. Next, we introduce the implementation of the framework for RDBMS
data sources. Later we introduce the semantics of the RDBMS implementation. We
conclude with an example of the usage of the the implementation for RDBMS
databases. The extension is
presented in terms of UML models. One can directly translate the UML models as
described in the original DIG 2.0 specification to obtain the corresponding XSD
schemas describing the XML based messages [DIG 2.0].
In Figure 1 we show the classes that compose the framework and how these new entities relate to the original picture of an ontology in DIG 2.0. The classes presented here are abstract and are intended to be the base for implementations of specific types of data sources and linking axioms.

Figure
1. Dig
2.0 ontology with linking axioms and data sources
A DataSource stands for any possible source of data which could be used to populate an ontology's ABox. A data source is uniquely identified with a DataSourceURI and related to a set of DataSourceParameters, which provide the information that the DIG server requires to interact with the given data source (e.g., establish a connection, access a file, etc.). Each data source is identified by a unique datasourceURI and each data source parameter is identified by a parameterURI. A data source or data source parameter can have any number of owlAnnotations, which can be used to attach human readable information. An annotation will have no effect on the interaction of the DIG server with the data source.
A LinkingAxiom is part of an ontology and is associated to a data source. It indicates a relationship between data in the data source and elements of the ontology. The specific semantics of the a linking axiom depends on the type of linking axiom (read ahead). Linking axioms can have associated annotations and, as is natural, they do not have any effect on the semantics of the axiom. The main elements of a linking axiom are the SourceQuery and the TargetQuery. A source query is a specification of how to extract data from the source expressed in some given query language. We do not restrict a priori the query language, which in the most general case could be any computation over the source. Restriction on the query language is done in implementing classes. A target query is a specification of how to extract data over the ontology. As with source queries, there are no restrictions on the language, and implementing classes must define it.
The following are the typical steps that need to be taken when a new class of data source is defined:
1. Defining a new DataSource subclass, specific for the type of data source which needs to be handled. Together with the data source type, one must define a subclass of DataSourceParameter appropriate for the data source type that we just defined. This DataSourceParameter subclass should also define the set of "default", parameterURI's that are associated to the type of data source that we want to handle (i.e., parameters that are always needed to enable a system to interact with that type of data source, e.g., location of the source, connection parameters, etc.).
2. Defining a new LinkingAxiom subclass for the data source type that we just defined.
3. Defining the appropriate SourceQuery and TargetQuery subclasses for the type of linking axiom previously defined. This could mean defining new subclasses for SourceQuery and/or TargetQuery, or choosing from the set of defined subclasses from implementations of the framework for other data sources which might be appropriate for the data source we are defining.
Together with the definition of these classes, the new implementation of the framework must provide the documentation necessary to clearly define the semantics of all the newly defined classes.
To allow for the management of ontologies with linking axioms we extend the Request and Response classes.
Concerning data sources management:
CreateDataSourceAllocateURI.- Tells the server that it should create a new data source and allocate an unique URI for it. Useful for creating anonymous or one time use data sources. The user can then use the given URI to add access parameters or relate linking axioms to it. The server should respond with a confirmation containing the URI of the data source on a successful execution.
CreateDataSource.- Tells the server to create a new data source identified by the given URI. The server should answer with an error if there exists another data source with the same URI.
ReleaseDataSource.- Tells the reasoner to delete the data source identified by the given URI. All linking axioms related to the data source must be removed by the user first. If there are still linking axioms related to this data source before issuing a release command, the reasoner shouldn't remove the data source and should reply with an Error explaining the situation. On successful execution of the command the server should delete any reference to data source parameters or annotations related to the given data source.
SetDataSourceParameters.- Tells the reasoner to assign a list of parameters to the given data source. If an URI of one of the given parameters is the same as one of the data source's already associated parameters, the old value of the parameter should be replaced with the new value and a warning should be issued.
GetDataSourceAnnotations.- Tells the reasoner to retrieve all the annotations associated to the given data source. The server should return a a (possibly empty) SetOfAnnotations containing all the annotations related to the data source.
GetDataSourceParameters- Tells the reasoner to retrieve all the parameters associated to the given data source. The server should return a (possibly empty) SetOfParameters
containing all the parameters related to the data source.

Concerning linking axiom management:
TellLinkingAxioms.- Tells the reasoner to associate a set of linking axioms to the given ontology.
GetAllLinkingAxiom.- Tells the reasoner to return a (possibly empty) SetofLinkingAxioms containing all the linking axioms associated to the given ontology.
RetractAllLinkingAxioms.- Tells the reasoner to remove the given linking axioms from the ontology. A warning should be issued if a given axiom was not previously associated.
RetractAllLinkingAxioms.- Tells the reasoner to remove all the linking axioms associated to the given ontology. A warning should be issued if there are no associated axioms.

Considerations:
Confirmations: If a command has been executed successfully, a confirmation should be issued.
Undefined URIs: If a command defining/retreiving a DataSource/LinkingAxiom/Ontology identified by an unknown URI, an error should be thrown indicating the issue.
Typing of Sources, Parameters and Axioms: The user needs to pass parameters which are properly typed. That is, if the user is defining parameters for a DataSource of type X which accepts parameters of type paramX, the parameters given must be of type paramX. An error should be returned as a result of any command with badly typed parameters. The same situation is for Linking Axioms and Data Sources.
Now we present an implementation of the framework for RDBMS data sources.
This implementation is build on top of the work presented in
[Poggi et al. 07]

Figure Implementation of the framework for RDBMS data
source
An instance of RDBMSDataSource stands for a RDBMS accessible to the DIG server. Related to a RDBMSDataSource there is a set of parameters. The class RDBMSParameter defines a set of URIs which identify the parameters that are commonly needed to access any RDBMSDataSource. These parameters are identified with the following URIs: RDBMSIP, RDBMSUsername, RDBMSPassword, and RDBMSDatabaseName. Hence, every RDBMSDataSource is, by default, associated with these RDBMSParameters. The default value for these parameters is an empty string.
RDBMSLinkingAxiom is the specific subclass of LinkingAxiom for RDBMS data sources (see Figure 3). We define two subtypes, the MappingAxiom and the TypingAxiom.

Figure
3. Linking Axioms
A MappingAxiom is a statement of relationship between a query over a RDBMS data source and a query over the ontology. The source query of a MappingAxiom is an instance of the SQLQuery class. A SQLQuery is characterized by a query attribute, containing a string representing a well formed SQL query, and an arity attribute, indicating the arity of the query. The target query of a MappingAxiom is an instance of the Retrieve class, which is the proposed extension for expressing union of conjunctive queries (ucq) over the ABox in DIG 2.0 (see the ABox Query Interface proposal [DIG ABox Query]).
Last, we extend the definition of the Retrieve class to allow
for the appearance of function symbols in the body of the query to be able to
express the type of mappings presented in
[Poggi et al. 07]
The next is a mapping axoim example that links the data in the name and lastname
columns of the P table, in the data source identified with the URI =
"rdbms://dbms.unibz.it"
to instances of the concept Student, in some ontology. Note the
use of the function
stud
|
|
A TypingAxiom is a statement of relationship between an attribute of a relation and an OWL data type. In a typing axiom, the source query is an instance of the SQLProjection class, which stands for the projection of the attribute attributeName over the relation relationName (see figure 3). The target query is represented by an OWL 1.1 data type. Typing axioms allow the user to validate the data in the data source against the ontology's data types.
The next is an example of a typing axiom
indicating that the data of the attribute name in the relation P
of the data source identified by the URI
rdbms://dbms.unibz.it belongs to the
xsd:String data type. Datatypes allowed are
those defined in the OWL 1.1 definition.
|
<owl11xml:DataType owl11xml:URI="xsd:String"/>
|
In this section we recall the work presented in
[DL-LiteA]
f(d_1,...,d_n), where
f is a function symbol of
arity n, and d_1,...,d_n are again value constants.Given ΓO, we can define an ABox in the standard way as a finite set of membership assertions, in which we may use not only value constants but object terms.
To define the semantics of such an ABox, we simply define an interpretation I in the standard way, and just note that the interpretation function I assigns a different element of the interpretation domain to every object identifier in ΓO (i.e., we enforce the unique name assumption on object identifiers).
We consider now the problem of linking objects in the ontology to the data in a relational database DB. We do so by relying on mapping techniques studied extensively in data integration [Lenzerini et al. 02]. Specifically, we consider a set M of linking axioms, partitioned into two sets, Mt and Ma, where:
Φ
Ψ
Φ
Ti
We recall from
[Poggi et al. 07] that typing axioms are used to assign appropriate types to constant values appearing in the relations of DB. Basically, these assertions are used for interpreting the values stored in the database in terms of the types used in the ontology. Mapping axioms, on the other hand, are used to map data in the database to concepts, roles, and
It is worth noting that now that we have object terms, the data layer underlying an ontology contains only data, whereas object identifiers are virtually built on top of this data.
Thus, autonomous data sources can effectively provide their portion of data and contribute to the ontology
instance-level, without being required to agree on any particular object identification scheme.
Ψ w.r.t. DB if for each
tuple t of value constants in
Γv, if
t ϵ ans(Φ,
DB)*, then we
have that tI
ϵ ΨI. An interpretation
I = (ΔI, I) is a modelWith the notion of model in place, we can define all reasoning services over ontologies with linking axioms in the usual way.
* We use ans(φ, DB) to denote the set of tuples (of the arity of φ) of value constants returned as the result of the evaluation of the query φ over the database DB.
Example:
Consider the following:
a) an ontology O consisting of
the axioms:
Student
ISA ƎName
,Lastname
,
CumLaudeStudent
ISA Student, where Name and Lastname are concrete domain roles (i.e., attributes);
b) database DB and a relation R
ϵ DB that stores
information about students; c) attributes n,
l, gpa$
of R, which store the name, lastname, and
gpa of students respectively; d) n, l is a binary key for
R; e) n, l are
strings. Now we present mapping m1, which is a mapping axiom, stating how
tuples in R are related to
CumLaudeStudent individuals, and typing axioms m2 m3 which type the attributes n and l of
R w.r.t. the value domains of O:
| m1: |
qdb(n, l)
R(n, l, gpa), gpa > 7.5
|
![]() |
qO(n, l ) CumLaudeStudent( stud(n, l )), Name(
stud(n, l ), n ), Lastname( stud(
n, l ), l ) |
| m2: |
qdb(n)
R(n, l, gpa) |
![]() |
name |
| m3: |
qdb(l)
R(n, l, gpa)
|
![]() |
lastname |
where each qdb is a First-Order (i.e., SQL) query expressed over R, qO is a conjunctive query over O, stud( n, l ) is a function from name-lastname pairs to ΓO, and name and lastname are concrete value domains for strings representing names and lastnames, respectively.
The following is the instantiation of the MappingAxiom class for mapping axiom
m1. The axiom is presented in the XML based
representation of MappingAxiom. Notice the use
of a the <QueryFunction .. /></QueryFunction>
element in query atoms within the ABox
Query.
|
|
This section provides 2 examples to illustrate a possible use of the framework. The examples are given in terms of the implementation for RDBMS data sources instantiated in the XML form of the protocol.
5.1 A case in which the user wants to populate several atomic concepts in the ontology with instances build from values retrieved from a single relation in a database.
Consider the following:
| ƎProducedIn- ISA Country | |
| Movie ISA ƎProducedIn | |
Movie ISA ƎName |
|
Movie ISA ƎDirector |
|
Movie ISA ƎYear |
|
Country ISA ƎName |
Denoting that a every Movie has attributes Name, Director
and Year, that every Movie participates in a relation ProducedIn
whose range is a Country, and last, that a Country has an attribute Name.
An ontology designer could then use the following linking axiom
to populate the Movie and Country concepts, as well as the ProduceIn role with
objects build from the tuples stored in the relation movie, and at the same time relate this objects
to the corresponding values for the Name, Director and Year attributes of these concepts.
|
|
Where
countryObj and movieObj are the functions that create
the object identifiers from the values given to them.
5.2 To follow up on the last case, If the ontology had an atomic concept for FrenchMovies, as in:
| FrenchMovie ISA Movie |
The designer could use the following linking axiom, in addition to the previous one, to populate this concept with the appropriate objects:
|
|
Lenzerini, M.: Data integration: A theoretical perspective.
In: Proc. of the 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database
Systems (PODS 2002). (2002) 233 246
Heflin, J., Hendler, J.: A portrait of the Semantic Web in action. IEEE
Intelligent Systems 16(2) (2001) 54 59
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rosati, R.:
Linking data to ontologies: The description logic dl-litea. In: Proc. of the 2nd
Workshop on OWL: Experiences and Directions (OWLED 2006). (2006)
Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
Linking data to ontologies. J. on Data Semantics (2007) To appear.
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Data
complexity of query answering in description logics. In: Proc. of the 10th Int.
Conf. on the Principles of Knowledge Representation and Reasoning (KR 2006).
(2006) 260 270.
Bechhofer, S.: The DIG Description Logic interface: DIG/1.0. Technical report,
University of Manchester (2002)
Bechhofer, S., Motik, B.: DIG 2.0 Specification. Editor s Draft of 02 January
2007. Available at: http://www.cs.man.ac.uk/ bmotik/dig/dig specification.html
Kaplunova, A., Moller, R.: ABox Query Interface proposal. Editor s Draft of 18
January 2007. Available at: http://www.sts.tu-harburg.de/ al.kaplunova/dig-query-interface.html
Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QUONTO: QUerying ONTOlogies. In: Proc. of the 20th Nat. Conf. on Artificial Intelligence (AAAI 2005). (2005) 1670 1671