Advanced Data Management Technologies (ADMT)

Academic Year:2018/19, 1st semester Timetable: here
Lecturer:Johann Gamper Lab instructor:Anton Dignös
Lectures:TU 10:00-12:00, FR 10:00-12:00, Room D0.02
Labs:FR 12:00-14:00, Room Laboratory
Office hours:WE 13:00-14:00 or email arrangement Lab office hours:WE 13:00-14:00, FR 11:00-12:00 or email arrangement

Objectives: Introduce advanced data management technologies that go beyond traditional (relational) database management systems and train students to evaluate the advantages and disadvantages of such technologies in different application contexts.

Prerequisites: Students should be familiar with basic concepts in databases (including relational databases, SQL, and relational algebra) and algorithms, as well as having good pogramming skills.

Syllabus:

Learning Outcomes:

References: There is no single textbook that covers the entire course. The course material is collected from various textbooks and research papers. They are indicated on the slides.

More details are in the course presentation form.

Schedule & Lecture Notes

Useful links: timetable, course material in OLE, lecture notes 2017/18

Week Day Date Topic Slides
1 TU 09.10.2018 Introduction: administration and organisation, the DB field, new trends in computing, motivation and need for advanced ADMT pdf
FR 12.10.2018 Basic concepts of BI and DW: business intelligence, from data to information, BI pyramid, BI key problems, DW definition, DW architecture, data integration, multidimensional model, OLAP vs. OLTP pdf
2 No lectures this week
3 TU 23.10.2018 Building a DW: methodological framework, DW design, top-down vs. bottom-up design, DW life-cycle, DW project management, BI competence center
DW requirements analysis and conceptual design: user needs, conceptual design, facts, dimensions, measures, dimensional fact model, building dimensional fact schemata
pdf
pdf
FR 26.10.2018 DW logical design: logical design, multidimensional model, star schema, snowflake schema, fact/event types, additivity of measures, different types of measures, DW applications, DW implementation pdf
4 TU 30.10.2018 DW case studies: grocery store example, 4-step DW design process, surrogate keys, degenerate dimensions, junk dimension, inventory management, order management, OncoNet, and MEDAN case studies, DW bus architecture, DW bus matrix, conformed dimensions, role-playing, fact normalization, multi-valued dimensions, bridge tables pdf
5 TU 06.11.2018 DW changing dimensions: slowly changing dimensions, overwrite old values, versioning of rows, timestamping of rows, rapidly changing dimensions pdf
FR 09.11.2018 Extract-transform-load: ETL process, ETL subsystems, building dimensions and fact tables, extract data, computing deltas, data transformation, normalization, type conversion, loading data, data quality, data cleansing. pdf
6 TU 13.11.2018 SQL OLAP extensions: SQL query specification; processing SQL queries; GROUP BY extensions: ROLLUP; CUBE; GROUPING SETS; GROUPING_ID. pdf, pdf, sql
FR 16.11.2018 We visit SFSCON 2018!
7 No lectures this week
8 No lectures this week; lab on Friday will take place
9 TU 04.12.2018 SQL OLAP extensions: analytic/window functions, moving window aggregates, ranking, percentiles, nested aggregates, densification pdf, sql
TU 07.12.2018 DW pre-aggregation and view maintenance: selecting and using pre-aggregation, lattice framework, hierarchies and the lattice framework, cost model, greedy algorithm, optimal cases of the greedy algorithm, incremental view maintenance pdf
10 TU 11.12.2018 Bitmap indexes: bitmap indexing, bitmap compression, run-length encoding, bit-sliced index, bitmap-encoded index, bitmapped join index pdf
11 TU 18.12.2018 NoSQL: motivation, new requirements, CAP theorem, BASE properties, NoSQL taxonomy, key-values stores, column stores, document stores, graph databases pdf
WE 19.12.2018 MapReduce: programming model, architecture, map and reduce functions, MR extensions and optimizations, execution, limitations, MR implementations, Hadoop pdf, [Li et al. 2014]
FR 21.12.2018 MR Execution and error handling: task scheduling in MR, job tracker, task tracker, locality, fault tolerance in MR, stragglers, speculative execution pdf
12 TU 08.01.2019 MR Design Patterns: common pitfalls in designing MR algorithms, design patterns, categorization of design patterns, numerical summarizations, filtering patterns (filtering, Bloom filtering, top ten, distinct), join patterns (reduce side join, replicated join, composite join, Cartesian Product) pdf
FR 11.01.2019 Distributed Systems: data independence, advantages of distributed systems, networking infrastructure, client-server, peer-to-peer networks, structured and unstructured P2P systems, data replication and consistency, replication policies, failure management pdf
13 TU 15.01.2019 (08:00-12:00)Distributed hash tables: motivation, linear hashing, centralized linear hashing, distributed linear hashing, consistent hashing, finger tables, Chord pdf
Main memory databases: definition, concurrency control and commit, data representation and compression, t-tree index, SAP HANA, Oracle TimesTen pdf
FR 18.01.2019 Project presentation

Project

The exercise part of the course consists in the elaboration of a project, which consists in the development and implementation of a Data Warehouse. The project is documented by a project report and concluded by a final presentation and demo.

The project shall be done in groups of 2-3 students. The individual group members might get different grades for the project, if it is evident that there were significant differences in the contribution to the project.

Sample project reports: sample project 1, sample project 2.

Assessment

The assessment of the course consists of two parts:

Both parts must be positive to pass the exam. The final grade is the weighted average between the two parts. A positive project is a pre-requisite to do the exam.

Here are some of the old exams: 30.01.2019, 19.09.2018, 30.01.2018, 10.07.2017, 31.01.2017, 02.02.2016, 15.09.2015, 06.07.2015, 28.01.2015, 10.06.2014, 10.02.2014.