Home Home Faculty Home Faculty of Computer Science
Free University of Bozen - Bolzano

Database and Information Systems

How to reach us
Open Positions

BSc: DB Stream
MSc: DB Stream

Advanced Topics in Databases
Advanced Topics in Inf. Systems
Approximation: Theory and Algorithms
Data Management Systems
Data Structures and Algorithms
Data Warehousing/Mining
Distributed Databases
Mobile Services
Seminar in Databases
Temporal and Spatial Databases

PhD Projects
Scientific Services

eBZ – 2015
Eurescom P817

IT Services
Faculty Council



Ole Guttorm Jensen, Multi-Dimensional Conditional Schema Evolution

Change is a fundamental but sometimes neglected aspect of database systems. In particular, changes in the real world often result in modifications to the database structure by evolving the schema. As database systems and other information systems become progressively more interdependent, changes to the database schema is no longer a problem local to the database but can necessitate changes in all systems interacting with the database. The management of change and the ability of the database system to deal with change is an essential component in developing and maintaining truly useful systems.

This Ph.D. thesis develops a formal foundation for conditional schema evolution in relational databases. Conditional schema changes change the schema of the tuples in a relation that satisfy the change condition. This class of schema changes is more general than traditional (unconditional) schema changes and temporal schema changes investigated by existing work. We develop the multi-dimensional conditionally evolving schema as a conceptual model for conditionally evolving schemas. The multi-dimensional conditionally evolving schema is the basis for practical data structures and algorithms developed in the thesis.

Changing the schema of a populated database leads to mismatches between the intended schema of a tuple and the schema used to record the tuple in the first place. The traditional approach of resolving the mismatch by migrating the (non-fitting) tuples to the new schema is lossy. The thesis proposes to keep track of the mismatches at the level of individual tuples, and develops the mismatch extended completed schema to this purpose. A salient feature of this model is that schema changes can be dealt with as standard tuple updates.

The thesis proposes a parametric approach to resolve the mismatches between the intended and recorded schemas of tuples at query time. This allows mismatches to be resolved according to the needs of the application through the specification of a policy. Algorithms for mismatch resolution of relations based on the mismatch extended completed schema are developed in the thesis.

As the schema evolves the database system is required to handle legacy tuples. Such tuples are already stored in the database if tuples are not migrated to fit the new schema, and also come from legacy applications, which continue to issue queries and assertions after a schema change has been committed. The thesis develop efficient algorithms for the classification of tuples. These algorithms along with the default policies for mismatch resolution allows for transparent schema evolution, so users and applications need not be aware that the database is evolving.

In temporal schema versioning schema changes are related to different time dimensions. The thesis specializes conditional schema evolution to conditions over time recording attributes. This leads to optimizations in the space complexity of the multi-dimensional conditionally evolving schema, and facilitates a comparison between conditional schema evolution and temporal schema versioning. For transaction-time the two approaches are equivalent. In valid-time schema versioning, a schema change is applied to a single schema version specified by the user irrespective of the validity of the change and the version. In conditional schema evolution a schema change changes the schema of all segments with a validity that overlaps the validity of the change.

Status: in progress, Aalborg University
Financing: Nykredit Center for Database Research, Aalborg University
Supervisor: Michael Böhlen

Linas Bukauskas, Moving Observer Support for Databases

Interactive visual data explorations impose rigid requirements on database and visualization systems. Systems that visualize huge amounts of data tend to request large amounts of memory resources and heavily use the CPU to process and visualize data. Current systems employ a loosely coupled architecture to exchange data between database and visualization system. Thus, the interaction of the visualizer and the database is kept to the minimum, which most often leads to superfluous data being passed from the database to the visualizer.

The Ph.D. thesis presents a novel tight coupling of database and visualizer. The thesis discusses the VR-tree, an extension of the R-tree that enables observer relative data extraction. To support incremental observer position relative data extraction the thesis proposes the Volatile Access Structure (VAST). VAST is a main memory structure that caches nodes of the VR-tree. VAST together with the VR-tree enables the fast extraction of appearing and disappearing objects from the observer's view as he navigates through the data space. The use of the VAST structure significantly reduces the number of objects to be extracted from the VR-tree and VAST enables a fast interaction of database and visualization systems.

The thesis describes techniques that extend the functionality of an observer aware database to support the extraction of the N most visible objects. This functionality is particularly useful if the number of newly visible objects is still too large. The thesis investigates how to optimize a given observer path. We propose query load balancing strategies that depend on a step-size and an incremental slice size. The proposed strategies balance a number of queries issued to the database and ensures that none of objects is overstepped along the path. All solutions have been implemented and evaluated empirically. A number of experiments validate the performance, scalability, and effectiveness of the optimizations.

Status: defended, Aalborg University, April 19, 2004
Financing: Danish research council, Aalborg University
Supervisor: Michael Böhlen
Assessment committee: Christian S. Jensen, Leo Mark, Simeon J. Simoff

Michael O. Akinde, Complex and Distributed On-line Analytical Processing

On-line analytical processing (OLAP), business intelligence, and multi-dimensional analysis has been the focus of intense research activity over the last few years. Early OLAP research focused primarily on simple aggregate queries; particularly the evaluation, usage, and maintenance of summary-table views. As data warehousing and analytical applications have gained ground in the industry, the challenges facing OLAP technology have increased in scale and complexity. Many applications exist which require the evaluation of very complex aggregate queries.

This Ph.D. thesis presents a general algebraic operator for the expression and evaluation of complex aggregate queries and considers two relevant research questions within the field of complex OLAP (i.e., aggregation queries that require expressions more complex than simple summary-table views): the evaluation of subquery predicates in the presence of complex aggregation, and the distributed evaluation of complex OLAP queries.

The thesis formalizes the generalized multi-dimensional join (GMD-join), an algebraic operator for complex OLAP and presents a set of algebraic transformation rules demonstrating how the operator interacts with the other operators of a multi-set algebra. The techniques for achieving an efficient evaluation of the GMD-join are considered, and cost-formulas for estimating the cost of evaluating the GMD-join are presented. The algebraic transformations, techniques, and cost-model presented in this thesis provide a foundation for the incorporation of the GMD-join, or a similar segmented evaluation operator into a conventional DBMS.

Subqueries are a common feature of complex OLAP queries. Despite this, no research work has considered the evaluation of subquery predicates in the presence of complex aggregation. The thesis presents a general algorithm that allow subquery predicates to be expressed as GMD-joins expressions thereby enabling them to be evaluated efficiently.

Many of the new applications for complex OLAP involve huge amounts of highly distributed data. In order for such data to be queried we need to develop and maintain a distributed data warehouse. This thesis develops a framework and describes a prototype for the distributed processing of complex OLAP queries. A general strategy for the distributed evaluation of complex OLAP queries expressed using GMD-joins is presented, and optimization strategies that exploit distribution knowledge, if known, as well as strategies that do not assume such knowledge, are developed. A series of experiments are presented to evaluate the performance of these strategies and validate the distributed processing algorithm. Finally, the architecture and algorithms of Skalla, a prototype system for the distributed evaluation of complex OLAP queries implemented during the Ph.D. project is documented.

Status: defended, Aalborg University, February 6, 2003
Financing: Nykredit Center for Database Research, Aalborg University
Supervisor: Michael Böhlen
Assessment committee: Gustavo Alonso, Christian S. Jensen, Willem Jonker