Data Scaling in OBDA Benchmarks: The VIG Approach

Davide Lanti, Guohui Xiao, and Diego Calvanese

Technical Report, e-Print archive. CoRR Technical Report arXiv:1607.06343 2016. Available at

In this paper we describe VIG, a data scaler for benchmarks in the context of ontology-based data access (OBDA). Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling up an input data instance to s times its size, while preserving certain application-specific characteristics. The advantage of the approach is that the user is not required to manually input the characteristics of the data to be produced, making it particularly suitable for OBDA benchmarks, where the complexity of database schemas might pose a challenge for manual input (e.g., the NPD benchmark contains 70 tables with some containing more than 60 columns). As opposed to a traditional data scaler, VIG includes domain information provided by the OBDA mappings and the ontology in order to produce data. VIG is currently used in the NPD benchmark, but it is not NPD-specific and can be seeded with any data instance. The distinguishing features of VIG are (1) its simple and clear generation strategy; (2) its efficiency, as each value is generated in constant time, without accesses to the disk or to RAM to retrieve previously generated values; (3) and its generality, as the data is exported in CSV files that can be easily imported by any RDBMS system. VIG is a java implementation licensed under Apache 2.0, and its source code is available on GitHub in the form of a Maven project. The code is being maintained since two years by the -ontop- team at the Free University of Bozen-Bolzano.

   title = "Data Scaling in OBDA Benchmarks:  The VIG Approach",
   year = "2016",
   author = "Davide Lanti and Guohui Xiao and Diego Calvanese",
   institution = " e-Print archive",
   number = "arXiv:1607.06343",
   note = "Available at",
pdf url