VIG: Data Scaling for OBDA Benchmarks

Davide Lanti, Guohui Xiao, and Diego Calvanese

Semantic Web J.. 10(2):413--433 2019.

In this paper we describe VIG, a data scaler for Ontology-Based Data Access (OBDA) benchmarks. Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling an input data instance to s times its size, while preserving certain application-specific characteristics. The advantages of the scaling approach are that the same generator is general, in the sense that it can be re-used on different database schemas, and that users are not required to manually input the data characteristics. In the VIG system, we lift the scaling approach from the pure database level to the OBDA level, where the domain information of ontologies and mappings has to be taken into account as well. VIG is efficient and notably each tuple is generated in constant time. To evaluate VIG, we have carried out an extensive set of experiments with three datasets (BSBM, DBLP, and NPD), using two OBDA systems (Ontop and D2RQ), backed by two relational database engines (MySQL and Postgres), and compared with real-world data, ad-hoc data generators, and random data generators. The encouraging results show that the data scaling performed by VIG is efficient and that the scaled data are suitable for benchmarking OBDA systems.

   title = "VIG: Data Scaling for OBDA Benchmarks",
   year = "2019",
   author = "Davide Lanti and Guohui Xiao and Diego Calvanese",
   journal = "Semantic Web J.",
   pages = "413--433",
   number = "2",
   volume = "10",
   doi = "10.3233/SW-180336",
pdf url