Ontology-based Integration of Cross-linked Datasets

Diego Calvanese, Martin Giese, Dag Hovland, and Martin Rezk

Proc. of the 14th Int. Semantic Web Conf. (ISWC 2015). Volume 9366 of Lecture Notes in Computer Science. 2015.

In this paper we tackle the problem of answering SPARQL queries over virtually integrated databases. We assume that the entity resolution problem has already been solved and explicit information is available about which records in the different databases refer to the same real world entity. Surprisingly, to the best of our knowledge, there has been no attempt to extend the standard Ontology-Based Data Access (OBDA) setting to take into account these DB links for SPARQL query-answering and consistency checking. This is partly because the OWL built-in same-as property, the most natural representation of links between data sets, is not included in OWL 2 QL, the de facto ontology language for OBDA. We formally treat several fundamental questions in this context: how links over database identifiers can be represented in terms of same-as statements, how to recover rewritability of SPARQL into SQL (lost because of same-as statements), and how to check consistency. Moreover, we investigate how our solution can be made to scale up to large enterprise datasets. We have implemented the approach, and carried out an extensive set of experiments showing its scalability.

   title = "Ontology-based Integration of Cross-linked Datasets",
   year = "2015",
   author = "Diego Calvanese and Martin Giese and Dag Hovland and Martin
   booktitle = "Proc. of the 14th Int. Semantic Web Conf. (ISWC 2015)",
   pages = "199--216",
   volume = "9366",
   publisher = "Springer",
   series = "Lecture Notes in Computer Science",
   doi = "10.1007/978-3-319-25007-6_12",
pdf url