Methods and Tools to Reconcile Data

Diego Calvanese, Domenico Lembo, and Maurizio Lenzerini

Technical Report, Dipartimento di Informatica e Sistemistica, Università di Roma "La Sapienza". D2I (Integration, Warehousing, and Mining of Heterogeneous Sources) Project Report D1.R11 2002.

A data integration system (DIS) provides access to a set of heterogeneous data sources through a so-called global schema. There are basically two approaches for designing a DIS. In the global-as-view (GAV) approach, one defines the elements in the global schema as views over the sources, whereas in the local-as-view (LAV) approach, one characterizes the sources as views over the global schema. In this paper we propose methodologies to reconcile data, both for LAV and GAV. For LAV, we propose to declaratively specify reconciliation correspondences to be used to solve conflicts among data in different sources, and define an algorithm that rewrites queries posed on the global schema in terms of both the source elements and the reconciliation correspondences. For GAV, it is a common opinion that query processing is much easier than in LAV, where query processing is similar to query answering with incomplete information. However, we show that, when constraints are expressed over the global schema, the problem of incomplete information arises in GAV as well. We provide a general semantics for a GAV DIS, and specify algorithms for query answering in the presence of both incompleteness of the sources and inconsistencies between the data at the sources and the constraints on the global schema.

