Advanved Topics in Databases (ATDB)
| Lecturer: |
Dr. Igor
Timko |
| Teaching assistant: |
|
| Teaching language: |
English |
| Prerequisites: |
Basic skills in C++ programming, Basic knowledge of
geometry, algebra, and mathematical logic |
| Office hours: |
Friday, 14:00-16:00 (prior notification by email required) |
| Academic Year: |
2008 - 2009 , 1st Semester |
| Start date: |
October 2 |
| Lectures: |
Thursday, 14:00-16:00, Lecture Room
E221, building E, 2nd floor |
Objectives:
To provide the
students with:
- practical, hands-on experience with a database management
system that supports advanced applications (in particular, that supports
moving objects)
- other knowledge on the database technology for advanced
applications (in particular, for moving objects)
Syllabus:
- Introduction to Moving Object Databases
- Introduction to Secondo DBMS
- Current Movement: MOST data model, FTL query language,
location updates, and uncertain trajectories
- History of Movement in 2D: abstract data types (ADTs) and
operations
- History of Movement in 2D: data structures for ADTs and
algorithms for operations
- History of Movement in 2D: constraint databases
- History of Movement in Networks:
ADTs and data structures, operations and algorithms
- Periodic Moving Objects: ADTs and data structures,
operations and algorithms
- Moving Objects on the Web
- Moving Objects in OLAP
- Indexing for Moving Objects
Introduction:
Course content: This course
covers topics related to moving object databases. In particular, the students
will acquire practical, hands-on experience with a database management system
SECONDO that
supports moving objects. After having finished the course, the students will be
able to design database extensions (user-defined types and functions) and
program them for SECONDO, with specialization on spatio-temporal data types. The
students will also get familiar with the database technology for moving objects
including data models for moving objects and their data structures, operations
for moving objects and their algorithms, indices for moving objects, moving
object database update policies, management of uncertain data on moving objects,
OLAP for moving objects, and web-based moving object databases
The content of this course is closely related to the Temporal and Spatial Databases course.
Mini-Projects: The ATDB course
emphasizes the applications of concepts, methods, and techniques presented in
the course. During the semester the students will work on a practical
mini-project. The students are welcome to choose any topic from the ATDB course,
or pick a mini-project from the list of the mini
projects proposals. The mini-projects are worked on in small groups of 2-3
people. The mini-project is an integral part of the final exam (cf.
the exam web page).
Prerequisites:
Teaching format:
Course material:
Lecture Notes: The lecture notes
for this course will be created as we progress through the semester. The current
set of lecture notes can always be found
here.
Lectures
This page contains links to slides and references to the literature. New
items will be appended as the course progresses. The complete set of last
year's slides is
here.
- Introduction to the course
Slides:
Literature: no literature
- Introduction to Moving Objects Databases
Slides:
Literature: Chapter 1 in the book
"Moving Objects Databases"
- Current Movement: MOST data model, FTL query language,
location updates, uncertain trajectories, etc.
Slides:
Literature: Chapter 3 in
the book
"Moving Objects Databases"
-
History of Movement in 2D: abstract data types (ADTs)
and their operations
Slides:
Literature: Chapter 4 in
the book
"Moving Objects Databases"
-
History of Movement in 2D: data structures for ADTs and
algorithms for their operations
Slides:
Literature:
- Sections 5.1, 5.2, and 5.3 in the book
"Moving Objects Databases"
- Chapter 5 in this paper:
L. Forlizzi, R.H. Guting, E. Nardelli, and M. Schneider.
A Data Model and Data Structures for Moving Objects Databases.
Proceedings of ACM SIGMOD Conference (Dallas, Texas), pp.
319-330, 2000.
- Alternatively, Section 4 and 5 in this paper:
J.A. Cotelo Lema, L. Forlizzi, R.H. Guting, E. Nardelli, and M.
Schneider.
Algorithms for Moving Objects Database.
The Computer Journal 46:6 (2003), pp. 680-712, 2003.
- Also, you may find this
description of the plane sweep algorithm useful
-
Periodic Moving Objects
Slides:
Literature:
-
History of Movement in Networks: ATDs and their data
structures, operations and their algorithms
Slides:
Literature:
-
Sequenced, Spatio-Temporal Aggregation in Road Networks
Slides:
Literature:
Mini Projects:
This web page presents information related to mini-projects of
the ATDB course. The students have two possibilities. First, the group can
choose a mini-project from the course. Second, the group can come up with its
own mini-project proposal.
Mini-project proposals:
- Aggregation Rectangle Tree (aR-tree) for Spatial Aggregation of Moving Objects NEW!
For some applications of moving object databases (e.g., traffic data analysis) spatial box aggregation queries are very important. Given a spatial region (box), this type of query asks for an aggregate data about moving objects present in this region (e.g., "How many objects are in this region?" or "What is the average speed of objects in this region?").
Spatial box aggregation queries may be efficiently processed by a data structure called aR-tree. The aR-tree is a hierarchy of regions. It is an extension of R-tree. The query processing is sped up by storing aggregate information at each tree node (e.g., a number of moving objects that fall into this node's region). With this information a query does not need to descend the tree all the way to the leaves. Before querying, the data from a relation is inserted into the aR-tree.
The goal of this mini-project is to implement a data type for the aR-tree and two operations: insert and query. The existing Secondo's implementation of the R-tree may be used as a basis.
Literature:
- D. Papadias, P. Kalnis, J. Zhang, and Y. Tao. Efficient OLAP Operations in Spatial Data Warehouses.
.pdf
- A. Guttman. R-trees: A Dynamic Index Structure for Spatial Searching.
.pdf
- MultiVersion Segment B-tree (MVSB-tree) for Temporal Aggregation of Moving Objects NEW!
For many applications of moving objects databases, range-temporal aggregate queries are important. Let S be a set of temporal records, each record having a key, a time interval, and a value. For example, a record may represent a moving object. Given a query key range, R, and time interval, I, a range-temporal aggregate query computes the total value of records in S whose keys are in R and whose time intervals intersect I. For example, a record in S may represent a moving object. Then, a key is the object's (1D) position on a road and the time interval tells when the object possibly is in that position. Given a road segment, R, and a time interval, I, a range-temporal query computes the number of objects that are possibly in R during I.
The range-temporal aggregate queries are efficiently processed by a data structure called MultiVersion Segment B-tree (MVSB-tree). The two interesting operations on the MVSB-tree are insertion (of moving object records) into the tree and querying the tree.
The goal of this project is to implement a data type for the MVSB-tree and two operations: insert and query.
Literature:
- D. Zhang, A. Markowetz, V. J. Tsotras, D. Gunopulos, and B. Seeger. On Computing Temporal Aggregates with Range Predicates.
.pdf
- Adaptive Multidimensional Histogram (AMH) for Spatio-Temporal Aggregation of Moving Objects NEW!
For many applications of moving object databases, spatio-temporal box aggregation queries are important. Given a road segment R and a time interval I, this type of query asks for an aggregate value for moving objects present in R during I (e.g., "How many objects passed this road segment between 1PM and 2PM?").
The spatio-temporal box aggregation queries are efficiently processed by a data structure called Adaptive Multidimensional Histogram (AMH). Queries can be processed fast because the data is hold in a small number of buckets. The two interesting operation on the AMH are insertion (of moving objects into the histogram) and querying.
The goal of this mini-project is to implement a data type for the AMH and two operations: insert (including bucket split) and query.
Literature:
- J. Sun, D. Papadias, Y. Tao, and B. Liu. Querying about the Past, the Present, and the Future in Spatio-Temporal Databases.
.pdf
- Queries on the Histogram-Based Representation of Moving Objects Probability Distributions
Moving objects databases contain data about positions of
moving objects. In many cases, this data is probabilistic. For example,
traffic management applications need to predict numbers of cars in the streets
in the near future (e.g., in 15 mins). Since future speed and direction of
cars is uncertain, it makes sense for a prediction algorithm to come up with a
probability distribution for the number of cars.
An efficient way of representing probability distributions
is based on histograms. For example, the histogram-based representation of a
probability distributions for the number of cars on street X may looks as
follows: P = {([0;10], 0.2), ([11;20], 0.3), ([21;35], 0.5)}. An interesting
type of queries for the histogram-based probability distributions is called
probability queries. These are queries of the form "Given a probability
distribution, P, for a number of cars in street X, what is the probability
that the number of cars is greater than 15". Since the histogram-based
probability distributions are approximate, there are several types of answers
to probability queries: pessimistic (lower bound), optimistic (upper bound),
and weighted ("average").
The goal of this mini-project is to implement a data type
for the histogram-based representation of probability distributions and to
efficiently implement operations for pessimistic, optimistic, and weighted
answers to probability queries.
Literature:
- I. Timko, C. E. Dyreson, and T.B. Pedersen. Extending
OLAP with Probabilistic Measures. Submitted to the VLDB Journal.
.pdf
- Evaluation of Precision of the Histogram-Based Representation of Moving Objects Probability Distributions
Moving objects databases contain data about positions of
moving objects. In many cases, this data is probabilistic. For example,
traffic management applications need to predict numbers of cars in the streets
in the near future (e.g., in 15 mins). Since future speed and direction of
cars is uncertain, it makes sense for a prediction algorithm to come up with a
probability distribution for the number of cars.
An efficient way of representing probability distributions
is based on histograms. For example, the histogram-based representation of a
probability distributions for the number of cars on street X may looks as
follows: P = {([0;10], 0.2), ([11;20], 0.3), ([21;35], 0.5)}. Since the
histogram-based probability distributions are approximate, it is important to
evaluate their precision. This can be done by computing the Kullback-Leibler
divergence between the "true" distribution and its histogram-based
representation.
The goal of this mini-project is to implement a data type
for the histogram-based representation of probability distributions and to
efficiently implement operations for computing the Kullback-Leibler divergence
between distributions.
Literature:
- I. Timko, C. E. Dyreson, and T.B. Pedersen. Extending
OLAP with Probabilistic Measures. Submitted to the VLDB Journal.
.pdf
- Wikipedia. Kullback-Leibler divergence.
link
SECONDO system
The common goal of the mini-projects of the MS course is to
extend a data model and a query language of
SECONDO database
management system. The extension will take the form of a collection of
related data types with associated operations. In SECONDO, this collection is
called algebra module.
Secondo documentation
This is a list of important documentation about SECONDO (you
can find more on their homepage):
Secondo newsgroup
There exists also a
newsgroup discussing questions related to Secondo.
-
Group: secondo
-
Server: agnesi.fernuni-hagen.de
This newsgroup requires authentication (just for spam protection):
-
User: secuser
-
Password: Concatenate 'sec' with '4891'
Secondo installation
It is recommended to install SECONDO on your computer. Useful
links:
How to run Secondo in the computer lab:
- Start SECONDO server:
- Start a new shell.
- Type: source /opt/secondo/.secondorc
- Type: cd /opt/secondo/bin
- Type: ./SecondoMonitor
- After some messages, type: startup
- Start query optimizer:
- Start a new shell.
- Type: source /opt/secondo/.secondorc
- Type: cd /opt/secondo/Optimizer/
- Type: ./StartOptServer
- Start Java GUI:
- Start a new shell.
- Type: source /opt/secondo/.secondorc
- Type: cd /opt/secondo/Javagui/
- Type: ./sgui
Deadlines:
>October 2 The course starts.
October 16 The students form
the groups and choose a mini-project. The groups should consists of 2-3
people. The group sends an email to the professor of the course (timko@inf.unibz.it)
with the list of the members (together with their email addresses) and the
chosen topic.
January 16 The students
present implementations of their mini-projects.
January 30
Mini-project report deadline. The students must send a PDF version of the
report to the professor of the course. Report size: 7-10 pages.
- February 13, 10:00-12:30, D003
Exam.
Exam
This web page describes the organization of the exam of the
ATDB course.
Organization of the exam:
- The exam is oral.
- Each student will be examined individually for 30 minutes.
- The exam consists of two parts:
- the mini-project part = a presentation of the
mini-project + a question from the mini-project and
- the theory part = a question from the course
The students should include examples into their
explanations.
- This is an open book exam (i.e., the students are allowed
to use any material at the exam). However, the students should freely operate
with the concepts and know definitions from the course.
Grading of the Exam:
- Up to 30 points for the mini-project part (+ cum laude) (MPPG<=30)
- Up to 30 points for the theory part (+ cum laude) (TPG<=30)
- The total grade is the sum of the grades for the
mini-project part and the theory part, divided by 2 (+ cum laude) (TG = (MPPG
+ TPG)/2)
- Note that both parts are compulsary to pass the exam (i.e.,
if the student gets <=18 from one of the parts, then the student fails the
exam)
Exam questions:
here