Description Logics for Conceptual Design, Information Access, and Ontology Integration

Description Logics for Conceptual Design, Information Access, and Ontology Integration: Research Trends

Tutorial Description

conceptual modelling and ontology design:

: For the purpose of this tutorial, an Ontology will be considered as a Conceptual Schema expressed in a suitable conceptual data model (i.e., an Ontology Language). Good conceptual data models put their emphasis on the correct and semantically rich representation of complex properties and relations that may exist between documents. They should allow for an abstract representation of data which resembles the way they are actually perceived and used in the real world, thus shortening (with respect to the more traditional data models) the semantic gap between the domain and its representation.
Conceptual (or Ontology) modelling deals with the question on how to describe in a declarative and reusable way the domain information of an application, its relevant vocabulary, and how to constrain the use the data, by understanding what can be drawn from it. Recently, a number of conceptual and ontology modelling languages has emerged as de-facto standard, in particular we mention Entity/Relationship (ER) for the relational data model, UML and ODMG for the object oriented data model, and XML, RDF and DAML+OIL for the web semi-structured data model. Still, many such languages do not have a formal semantics based on logic, or reasoners built upon them to support the designer. Not surprisingly, conceptual modelling tasks have always been in the mainstream of KR research - see for example the research on Ontology representation and design - and can be considered now one of the main applications of KR languages and reasoning techniques [BB02]. DL can be considered as an unifying formalism, since they allow the logical reconstruction and the extension of representational tools such as object-oriented data models (e.g., UML and ODMG), semantic data models (e.g., Entity/Relationship and ORM), frame-based ontology languages (e.g., OIL and DAML+OIL) [CLN98,CLN99,CCDGL01,]. In addition, given the high complexity of the modelling task when complex data is involved, in the semantic web field there is the demand of more sophisticated and expressive languages than for normal information systems. Again, DL research is very active in providing expressive ontology languages to capture various aspects of the information (see, e.g., [AF99,,FS99,BKW02]).
In this tutorial I will present examples using a generic conceptual data model. I will point out how it generalises both the object-oriented data model based on UML class diagrams and the extended Entity-Relationship (EER) semantic data model, and how it is strictly related to OIL and DAML+OIL. The ontology language includes taxonomic relations to state containment assertions between entities and between relationships with the possibility to specify additional covering and disjointness constraints. The most interesting feature of the modelling language is the ability to completely define entities and relationships as views over other entities and relationships of the ontology [CLN98]. The adopted view language is DLR [CGL+98], a Description Logic over unary and n-ary relationships. DLR is an interesting decidable fragment of first order logic: among others, inclusion dependencies with DLR views can express (a) unary inclusion dependencies, (b) typed inclusion dependencies without projection, (c) existence dependencies, (d) exclusion dependencies, and (e) full key dependencies. DLR is powerful enough to encode the full EER, the UML class diagrams and most of DAML+OIL. An informal introduction to the properties of the DLR Description Logic will be given.
Two additional extensions to the conceptual data model will be also considered. The first one is with multidimensional aggregations - that is, the conceptual data model is able to represent the structure of aggregated entities and of multiply hierarchically organised dimensions. The ability of representing aggregations at the conceptual level is crucial in modelling structured documents in data warehouses, in the semantic web and in digital libraries. The second one allows for the representation of standard temporal operators for temporal conceptual modelling and of a large class of temporal integrity constraints, useful to model the dynamics in the sematic web.
At the end of this first part, a demo of the i.com tool [FN00,JQC+00] - which implements the above conceptual data model as UML class diagrams or EER schemas - will be given. i.com allows for the specification of multiple EER (or UML) diagrams and inter- and intra-schema constraints. Complete logical reasoning is employed by the tool using an underlying DL inference engine to verify the specification, infer implicit facts and stricter constraints, and manifest any inconsistencies during the conceptual modelling phase.

information access:

Only recently has KR research started to have an interest in query processing and information access. Recent work has come up with advanced reasoning techniques for query evaluation and rewriting using views under the constraints given by the ontology - also called view-based query processing [Ull97,CGLV00]. This means that the notion of accessing information through the navigation of an Ontology modelling the document's domain - which can be seen as a conceptual schema - has its formal foundations.

In this tutorial I will thus consider DL for formalising not only the ontology but also the query processing as well. The (DL-based) conceptual schema as defined in the previous section can be seen as a set of constraints over a vocabulary which is usually richer that the logical schema of the information system it is modelling. In some sense, quite often the conceptual schema plays the role of an general ontology of the domain, very close to the user's rich vocabulary, rather than of a set of constraints over the poor logical vocabulary structuring the data. With this perspective in mind, the user would prefer to query the information system using the richer vocabulary of the ontology. The vocabulary of the basic data (i.e., the logical schema) could be seen in turn either as a subset of the conceptual vocabulary - this is the simplistic view - or more generally as a set of (materialised) views over the vocabulary of the ontology. However, in this case we have to solve the problem of view-based query processing. The problem requires to answer a query posed to a database - the one defined by the ontology - only on the basis of the information in a set of (materialised) views, which are again queries over the same database. In the process, the information contained in the conceptual schema of the database should be of course taken into account.

I will introduce the two approaches to view-based query processing, namely query rewriting (see, e.g., [BLR97]) and query answering (see, e.g., [AD98,CGL00]). In the former approach, we are given a query Q, a set of view definitions characterising the actual data, and a set of (conceptual) constraints - all over the conceptual vocabulary - and the goal is to reformulate the query into an expression, the rewriting, that refers only to the views, and provides the answer to Q. Typically, the rewriting is formulated in the same language used for the query and the views. In the latter approach, besides Q, the view definitions and the constraints, we are also given the extensions of the (materialised) views. The goal is to compute the set of tuples that are implied by these extensions, i.e., the set of tuples that are in the answer set of Q in all the databases that are consistent with the views and the constraints.

This framework can be used to characterise several aspects of an information system. In query optimisation, view-based query processing is relevant because using the views may speed up query processing. In data integration, the views represent the only information sources accessible to answer a query. A data warehouse can be seen as a set of materialised views, and, therefore, query processing reduces to view-based query answering. Finally, since the views provide partial knowledge on the database, view-based query processing can be seen as a special case query answering with incomplete information.

information integration:

In this last part I will show how the technologies introduced in the first two parts, namely a very expressive ontology language and view-based query processing over it, can be used in the framework of Information Integration [CL93,CGL+98,JLVV99,JQC+00].

Let us suppose to have multiple databases to be integrated. Each database will have its own conceptual schema and logical schema, where, as seen in the previous part, the logical schema is just a set of views over the conceptual schema (local-as-view approach). We assume that each symbol of each schema is identified by a unique global symbol, i.e., the various databases have disjoint signatures. Interdependencies between entities and relationships in different schemas are represented by means of integrity constraints involving symbols of the schemas. Such interdependencies are called inter-model assertions, and they are of the form of DLR inclusion dependencies. The union of the various schemas with the inter-model assertions and the local views forms the global integrated schema, or the mediator. It is worth noting that the integration process is incremental - since the integrated schema can be monotonically refined as soon as there is new understanding of the different component schemas - and that the resulting unified schema is strongly dependent from (actually, it includes) the schemas of the single information sources.

This approach gives both a clear semantics to the integration process of ontologies, and a calculus for deriving inconsistencies and checking the validity of integrity constraints in the integrated schema. Most importantly, in this framework global queries can be defined as views over single ontologies, or they can be generalised to span over multiple ontologies. The view-based query processing mechanism will guarantee the correct answer to the global query from the local sources. In the tutorial a complete worked out example will be given.

The particular but important case of designing a Data Warehouse Conceptual Schema will be presented. In this case it is assumed to have a privileged schema - called the Enterprise Model - which is the conceptual representation of the global concepts and relationships reconciled and abstracted in the data warehouse, and it is not necessarily a complete model of all the source information. Such schema is integrated with the different source schemas. The crucial point is that not only the interrelationships between the source schemas and the Enterprise Model are modelled, but also the interdependencies between the source schemas themselves. Moreover, the global integrated schema - the Data Warehouse Conceptual Schema - is composed not only by the Enterprise Model, but also by the various source schemas and by the inter-model assertions. Global data warehouse queries are formally seen as views over the Enterprise Model.

In the tutorial a comparison will be given between the above local-as-view approach to processing global queries and the global-as-view approach, which is more common in current information integration architectures.

Tutorial version 1 ( Last modified: Sat Mar 23 09:57:21 GMT 2002 )

Disclaimers:

Course material prepared by me may contain errors: please, help me in making it better.

Parts of the above course material have been inspired by many contributors in the DL field: thanks to them all!

Online papers may be copyrighted and they are available for evaluation purposes only. People are invited to contact the authors or the publishers for permissions.