Advanved Topics in Databases (ATDB)


Lecturer: Dr. Igor Timko
Teaching assistant:  
Teaching language: English
Prerequisites: Basic skills in C++ programming, Basic knowledge of geometry, algebra, and mathematical logic
Office hours: Friday, 14:00-16:00 (prior notification by email required)
Academic Year: 2008 - 2009 , 1st Semester
Start date: October 2
Lectures: Thursday, 14:00-16:00, Lecture Room E221, building E, 2nd floor


Objectives | Syllabus | Introduction | Lectures | Mini Projects | Deadlines | Exam 


Objectives: 

To provide the students with:

Syllabus:

Introduction:

Course content: This course covers topics related to moving object databases. In particular, the students will acquire practical, hands-on experience with a database management system SECONDO that supports moving objects. After having finished the course, the students will be able to design database extensions (user-defined types and functions) and program them for SECONDO, with specialization on spatio-temporal data types. The students will also get familiar with the database technology for moving objects including data models for moving objects and their data structures, operations for moving objects and their algorithms, indices for moving objects, moving object database update policies, management of uncertain data on moving objects, OLAP for moving objects, and web-based moving object databases

The content of this course is closely related to the Temporal and Spatial Databases course.

Mini-Projects: The ATDB course emphasizes the applications of concepts, methods, and techniques presented in the course. During the semester the students will work on a practical mini-project. The students are welcome to choose any topic from the ATDB course, or pick a mini-project from the list of the mini projects proposals. The mini-projects are worked on in small groups of 2-3 people. The mini-project is an integral part of the final exam (cf. the exam web page).

Prerequisites:

Teaching format:

Course material:

Lecture Notes: The lecture notes for this course will be created as we progress through the semester. The current set of lecture notes can always be found here.  

Lectures

This page contains links to slides and references to the literature. New items will be appended as the course progresses. The complete set of last year's slides is here.

  1. Introduction to the course
  2. Slides:

    Literature: no literature


  3. Introduction to Moving Objects Databases
  4. Slides:

    Literature: Chapter 1 in the book "Moving Objects Databases"


  5. Current Movement: MOST data model, FTL query language, location updates, uncertain trajectories, etc.
  6. Slides:

    Literature: Chapter 3 in the book "Moving Objects Databases"


  7. History of Movement in 2D: abstract data types (ADTs) and their operations

    Slides:

    Literature: Chapter 4 in the book "Moving Objects Databases"


  8. History of Movement in 2D: data structures for ADTs and algorithms for their operations

    Slides:

    Literature:


  9. Periodic Moving Objects

    Slides:

    Literature:


  10. History of Movement in Networks: ATDs and their data structures, operations and their algorithms

    Slides:

    Literature:


  11. Sequenced, Spatio-Temporal Aggregation in Road Networks

    Slides:

    Literature:


Mini Projects:

This web page presents information related to mini-projects of the ATDB course. The students have two possibilities. First, the group can choose a mini-project from the course. Second, the group can come up with its own mini-project proposal.


Mini-project proposals:

  1. Aggregation Rectangle Tree (aR-tree) for Spatial Aggregation of Moving Objects NEW!

    For some applications of moving object databases (e.g., traffic data analysis) spatial box aggregation queries are very important. Given a spatial region (box), this type of query asks for an aggregate data about moving objects present in this region (e.g., "How many objects are in this region?" or "What is the average speed of objects in this region?").

    Spatial box aggregation queries may be efficiently processed by a data structure called aR-tree. The aR-tree is a hierarchy of regions. It is an extension of R-tree. The query processing is sped up by storing aggregate information at each tree node (e.g., a number of moving objects that fall into this node's region). With this information a query does not need to descend the tree all the way to the leaves. Before querying, the data from a relation is inserted into the aR-tree.

    The goal of this mini-project is to implement a data type for the aR-tree and two operations: insert and query. The existing Secondo's implementation of the R-tree may be used as a basis.

    Literature:

    • D. Papadias, P. Kalnis, J. Zhang, and Y. Tao. Efficient OLAP Operations in Spatial Data Warehouses. .pdf
    • A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching. .pdf


     

  2. MultiVersion Segment B-tree (MVSB-tree) for Temporal Aggregation of Moving Objects NEW!

    For many applications of moving objects databases, range-temporal aggregate queries are important. Let S be a set of temporal records, each record having a key, a time interval, and a value. For example, a record may represent a moving object. Given a query key range, R, and time interval, I, a range-temporal aggregate query computes the total value of records in S whose keys are in R and whose time intervals intersect I. For example, a record in S may represent a moving object. Then, a key is the object's (1D) position on a road and the time interval tells when the object possibly is in that position. Given a road segment, R, and a time interval, I, a range-temporal query computes the number of objects that are possibly in R during I.

    The range-temporal aggregate queries are efficiently processed by a data structure called MultiVersion Segment B-tree (MVSB-tree). The two interesting operations on the MVSB-tree are insertion (of moving object records) into the tree and querying the tree.

    The goal of this project is to implement a data type for the MVSB-tree and two operations: insert and query.

    Literature:

    • D. Zhang, A. Markowetz, V. J. Tsotras, D. Gunopulos, and B. Seeger. On Computing Temporal Aggregates with Range Predicates. .pdf


     

  3. Adaptive Multidimensional Histogram (AMH) for Spatio-Temporal Aggregation of Moving Objects NEW!

    For many applications of moving object databases, spatio-temporal box aggregation queries are important. Given a road segment R and a time interval I, this type of query asks for an aggregate value for moving objects present in R during I (e.g., "How many objects passed this road segment between 1PM and 2PM?").

    The spatio-temporal box aggregation queries are efficiently processed by a data structure called Adaptive Multidimensional Histogram (AMH). Queries can be processed fast because the data is hold in a small number of buckets. The two interesting operation on the AMH are insertion (of moving objects into the histogram) and querying.

    The goal of this mini-project is to implement a data type for the AMH and two operations: insert (including bucket split) and query.

    Literature:

    • J. Sun, D. Papadias, Y. Tao, and B. Liu. Querying about the Past, the Present, and the Future in Spatio-Temporal Databases. .pdf


     

  4. Queries on the Histogram-Based Representation of Moving Objects Probability Distributions

    Moving objects databases contain data about positions of moving objects. In many cases, this data is probabilistic. For example, traffic management applications need to predict numbers of cars in the streets in the near future (e.g., in 15 mins). Since future speed and direction of cars is uncertain, it makes sense for a prediction algorithm to come up with a probability distribution for the number of cars.

    An efficient way of representing probability distributions is based on histograms. For example, the histogram-based representation of a probability distributions for the number of cars on street X may looks as follows: P = {([0;10], 0.2), ([11;20], 0.3), ([21;35], 0.5)}. An interesting type of queries for the histogram-based probability distributions is called probability queries. These are queries of the form "Given a probability distribution, P, for a number of cars in street X, what is the probability that the number of cars is greater than 15". Since the histogram-based probability distributions are approximate, there are several types of answers to probability queries: pessimistic (lower bound), optimistic (upper bound), and weighted ("average").

    The goal of this mini-project is to implement a data type for the histogram-based representation of probability distributions and to efficiently implement operations for pessimistic, optimistic, and weighted answers to probability queries.

    Literature:

    • I. Timko, C. E. Dyreson, and T.B. Pedersen. Extending OLAP with Probabilistic Measures. Submitted to the VLDB Journal. .pdf


     

  5. Evaluation of Precision of the Histogram-Based Representation of Moving Objects Probability Distributions

    Moving objects databases contain data about positions of moving objects. In many cases, this data is probabilistic. For example, traffic management applications need to predict numbers of cars in the streets in the near future (e.g., in 15 mins). Since future speed and direction of cars is uncertain, it makes sense for a prediction algorithm to come up with a probability distribution for the number of cars.

    An efficient way of representing probability distributions is based on histograms. For example, the histogram-based representation of a probability distributions for the number of cars on street X may looks as follows: P = {([0;10], 0.2), ([11;20], 0.3), ([21;35], 0.5)}. Since the histogram-based probability distributions are approximate, it is important to evaluate their precision. This can be done by computing the Kullback-Leibler divergence between the "true" distribution and its histogram-based representation.

    The goal of this mini-project is to implement a data type for the histogram-based representation of probability distributions and to efficiently implement operations for computing the Kullback-Leibler divergence between distributions.

    Literature:

    • I. Timko, C. E. Dyreson, and T.B. Pedersen. Extending OLAP with Probabilistic Measures. Submitted to the VLDB Journal. .pdf
    • Wikipedia. Kullback-Leibler divergence. link


SECONDO system

The common goal of the mini-projects of the MS course is to extend a data model and a query language of SECONDO database management system. The extension will take the form of a collection of related data types with associated operations. In SECONDO, this collection is called algebra module.

Secondo documentation

This is a list of important documentation about SECONDO (you can find more on their homepage):

Secondo newsgroup

There exists also a newsgroup discussing questions related to Secondo.
  • Group: secondo
  • Server: agnesi.fernuni-hagen.de
This newsgroup requires authentication (just for spam protection):
  • User: secuser
  • Password: Concatenate 'sec' with '4891'

Secondo installation

It is recommended to install SECONDO on your computer. Useful links:

How to run Secondo in the computer lab:

  1. Start SECONDO server:
    1. Start a new shell.
    2. Type: source /opt/secondo/.secondorc
    3. Type: cd /opt/secondo/bin
    4. Type: ./SecondoMonitor
    5. After some messages, type: startup
  2. Start query optimizer:
    1. Start a new shell.
    2. Type: source /opt/secondo/.secondorc
    3. Type: cd /opt/secondo/Optimizer/
    4. Type: ./StartOptServer
  3. Start Java GUI:
    1. Start a new shell.
    2. Type: source /opt/secondo/.secondorc
    3. Type: cd /opt/secondo/Javagui/
    4. Type: ./sgui

Deadlines:

  • >October 2 The course starts.
  • October 16 The students form the groups and choose a mini-project. The groups should consists of 2-3 people. The group sends an email to the professor of the course (timko@inf.unibz.it) with the list of the members (together with their email addresses) and the chosen topic.
  • January 16 The students present implementations of their mini-projects.
  • January 30 Mini-project report deadline. The students must send a PDF version of the report to the professor of the course. Report size: 7-10 pages.
  • February 13, 10:00-12:30, D003 Exam.

Exam

This web page describes the organization of the exam of the ATDB course.

Organization of the exam:

  1. The exam is oral.
  2. Each student will be examined individually for 30 minutes.
  3. The exam consists of two parts:
    • the mini-project part = a presentation of the mini-project + a question from the mini-project and
    • the theory part = a question from the course

    The students should include examples into their explanations.

  4. This is an open book exam (i.e., the students are allowed to use any material at the exam). However, the students should freely operate with the concepts and know definitions from the course.

Grading of the Exam:

  1. Up to 30 points for the mini-project part (+ cum laude) (MPPG<=30)
  2. Up to 30 points for the theory part (+ cum laude) (TPG<=30)
  3. The total grade is the sum of the grades for the mini-project part and the theory part, divided by 2 (+ cum laude) (TG = (MPPG + TPG)/2)
  4. Note that both parts are compulsary to pass the exam (i.e., if the student gets <=18 from one of the parts, then the student fails the exam)

Exam questions: here