Temporal and Spatial Databases (TSDB)
| Academic Year: | 2012/13, 2nd semester |
| Lecturer: | Johann Gamper |
| Lectures: | WE 16:00-18:00, Room E412 |
| Labs: | WE 18:00-19:00, Room E412 |
| Office hours: | WE 15:00-16:00 or email arrangement |
Objectives: This course will introduce principles and foundations of temporal and spatial databases, including data models, query languages, algebras, and algorithms for selected operators.
Prerequisites: Students should be familiar with basic concepts in databases (including relational databases, SQL, and relational algebra) and algorithms, as well as having good pogramming skills. This material is taught in the following courses: Introduction to Databases, Introduction to Programming, and Data Structures and Algorithms.
Syllabus:
- Introduction and motivation
- Time ontology, structure, and granularity
- Temporal data models
- Temporal relational algebras
- Temporal query lanaguages
- Algorithms for temporal join and aggregation
- Spatial databases and data models
- Shortest path, k-NN, and isochrone queries
References: There is no single textbook that covers the entire course. The course material is collected from the following books and papers:
- Zaniolo et al.: Advanced Database Systems, (chapter 5, 6, and 7), Morgan Kaufmann Publishers, Inc., 1997
- R.T. Snodgrass (ed.): The TSQL2 temporal query language, Kluwer Academic Publishers, 1995.
- R.H. Güting and M. Schneider: Moving objects databases (chapter 1 and 2),Morgan Kaufmann Publishers, Inc., 2005
- M. Böhlen, J. Gamper, and C.S. Jensen. Multi-dimensional aggregation for temporal data. In Proc. of EDBT-2006, pp. 257-275, 2006.
- D. Gao, C.S. Jensen, R.T. Snodgrass, and M.D. Soo: Join operations in temporal databases. VLDB Journal, vol. 14, pp. 2-29, 2005.
- C.S. Jensen, M.D. Soo, and R.T. Snodgrass: Unifying temporal data models via a conceptual model. Information Systems, vol. 19, no. 7, pp. 513-547, 1994.
Lectures and Lecture Notes
| 1. | WE 06.03.13 | Introduction: syllabus, administration and organisation, the DB field, short overview of RDMS, motivation for temporal DBMS, a case study | |
| 2. | WE 13.03.13 | Conceptual temporal data models I: time domain, time dimensions and timestamps, point-based data models, interval-based data models, data models with temporal elements, attribute value timestamping, the bitemporal conceptual data model (BCDM) | |
| 3. | WE 20.03.13 | Representational temporal data models: interaction of BCDM and representational models, Snodgrass' tuple timestamping model, Jensen's backlog model, Ben-Zvi's tuple timestamped model, Gadia's attribute value timestamped model, McKenzie's attribute value timestamped model | |
| 4. | WE 27.03.13 | Relational algebra for bitemporal relations: algebra for BCDM, algebra for Sondgrass' tuple timestamped data model, reducibility of operators, semantic equivalence | |
| 5. | WE 03.04.13 | Temporal query languages: Language design criteria, comparison of timestamps, comparative analysis of temporal query languages (SQL + ADT, IXSQL, SQL/TP, TSQL2, ATSQL) | |
| 6. | WE 10.04.13 | Temporal join: Query processing, definition of temporal Cartesian product and temporal joins, evaluation algorithms for temporal joins (nested-loop, sort-merge, partition-based, index-based) | |
| 7. | WE 17.04.13 | Temporal aggregation: snapshot vs. temporal aggregation, aggregation tree algorithm, balanced tree algorithm, merge-sort algorithm, bucket algorithm | |
| 8. | WE 24.04.13 | Temporal aggregation: a general temporal aggregation framework, parsimounious temporal aggregation | |
| 9. | WE 08.05.13 | Spatial databases: introduction, modeling spatial concepts, organizing the underlying space, spatial data types and algebras, spatial relationships | |
| 10. | WE 15.05.13 | Spatial databases: integrating geometry into DBMS data model, querying spatial databases, spatial indexes | |
| 11. | WE 22.05.13 | Spatial network databases: definition, shortest path queries, Dijkstra's algorithm, landmark-based approximation algorithms, k-NN search | |
| 12. | WE 29.05.13 | Spatial extensions of Oracle and Postgres |
Project
The exercise part of the course consists in the elaboration of a mini-project, which can be done alone or in groups of 2 students. The project consists in the implemenation and evaluation of one of the following research papers:
- M. H. Böhlen, J. Gamper, and C. S. Jensen. Multi-dimensional Aggregation for Temporal Data. In Proc. EDBT, pp. 257-275, 2006. (pdf)
- N. Kline and R. T. Snodgrass. Computing Temporal Aggregates. In Proc. ICDE, pp. 222-231, 1995. (pdf)
- B. Moon, I. F. Vega Lopez, and V. Immanuel. Efficient Algorithms for Large-Scale Temporal Aggregation. IEEE TKDE, 15(3): 744-759, 2003. (pdf)
- J. Yang and J. Widom. Incremental Computation and Maintenance of Temporal Aggregates. The VLDB Journal, 12(3): 262-283, 2003. (pdf)
- J. Gordevicius, J. Gamper, M. Böhlen. Parsimonious temporal aggregation. The VLDB Journal. Appears in 2012. (pdf)
- D. Gao, C.S. Jensen, R.T. Snodgrass, and M.D. Soo: Join operations in temporal databases. VLDB Journal, vol. 14, pp. 2-29, 2005. (pdf)
- D. Papadias, J. Zhang, N. Mamoulis, Y. Tao. Query processing in spatial network databases. VLDB-03. (pdf)
- K. Tretyakov, A. Armas-Cervantes, L. Garcia-Banuelos, J. Vilo, M. Dumas. Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs. CIKM-11. (pdf)
- J. Gamper, M. Böhlen, M. Innerebner. Scalable computation of isochrones with network expiration. SSDBM-12. (pdf)
The project concludes with a final report of 5-10 pages which clearly states the problem and describes the solution, implementation and evaluation.
For the implementation we recommend Java or C as programming language and Oracle, Postgres, or MySQL as database system, but you are free to choose any other programming language and relational database system. For the evaluation, the following data sets of different size (200K, 400K, 600K, 800K, 1000K of tuples) are available here ("nn" indicates the size of the data set, e.g., 02 = 200000 tuples):
- data-seq-nn: Sequential tuples, i.e., the timestamps of the tuples are disjoint.
- data-equal-nn: All tuples have the same timestamp.
- data-random-nn: Start time and duration of the tuples are uniformly distributed.
- data-worst-nn: All start and end points are different, and there is a time interval (in the middle) where all tuples hold.
Milestones
- WE, 06.03.13: Presentation of the project and the research papers.
- WE, 13.03.13: Read the research papers and choose one for the implementation.
- WE, 27.03.13: Study the research paper in detail.
- WE, 17.04.13: Develop a first running prototype.
- WE, 08.05.13: Finalize the implementation.
- WE, 15.05.13: Run experiments.
- WE, 22.05.13: Write project report.
- WE, 29.05.13: Short praesentation of projects.
Exam
- The exam is oral and consists of two parts:
- discussion about the project and the evaluation of the project report (50% of the total grade);
- questions from the TSDB course (50% of the total grade).
- Both parts must be positive to pass the exam.
- Each student will be examined individually for approx. 20-30 minutes.
- This is an open book exam, i.e. the student is allowed to use the lecture notes and additional readings in the exam.