Advanced Data Management Technologies (ADMT)
Academic Year: | 2018/19, 1st semester | Timetable: | here |
Lecturer: | Johann Gamper | Lab instructor: | Anton Dignös |
Lectures: | TU 10:00-12:00, FR 10:00-12:00, Room D0.02 |
Labs: | FR 12:00-14:00, Room Laboratory |
Office hours: | WE 13:00-14:00 or email arrangement | Lab office hours: | WE 13:00-14:00, FR 11:00-12:00 or email arrangement |
Objectives: Introduce advanced data management technologies that go beyond traditional (relational) database management systems and train students to evaluate the advantages and disadvantages of such technologies in different application contexts.
Prerequisites: Students should be familiar with basic concepts in databases (including relational databases, SQL, and relational algebra) and algorithms, as well as having good pogramming skills.
Syllabus:
- Data warehousing and business intelligence
- Multidimensional modelling and OLAP
- NoSQL and map-reduce
- Distributed databases and peer-to-peer systems
- Distributed access structures
- Main memory database systems
Learning Outcomes:
- Basic knowledge of storing, querying and managing large amounts of data and the associated languages, tools and systems
- Practical application and evaluation of tools and techniques in the field of data science
- Ability to autonomously select the documentation needed to keep up to date in a given sector
- Ability to use English at an advanced level with particular reference to disciplinary terminology
- Ability to autonomously extend the knowledge acquired during the study course
References: There is no single textbook that covers the entire course. The course material is collected from various textbooks and research papers. They are indicated on the slides.
More details are in the course presentation form.
Schedule & Lecture Notes
Useful links: timetable, course material in OLE, lecture notes 2017/18
Week | Day | Date | Topic | Slides |
1 | TU | 09.10.2018 | Introduction: administration and organisation, the DB field, new trends in computing, motivation and need for advanced ADMT | |
FR | 12.10.2018 | Basic concepts of BI and DW: business intelligence, from data to information, BI pyramid, BI key problems, DW definition, DW architecture, data integration, multidimensional model, OLAP vs. OLTP | ||
2 | No lectures this week | |||
3 | TU | 23.10.2018 | Building a DW: methodological framework, DW design, top-down vs. bottom-up design, DW life-cycle, DW project management, BI competence center DW requirements analysis and conceptual design: user needs, conceptual design, facts, dimensions, measures, dimensional fact model, building dimensional fact schemata |
pdf |
FR | 26.10.2018 | DW logical design: logical design, multidimensional model, star schema, snowflake schema, fact/event types, additivity of measures, different types of measures, DW applications, DW implementation | ||
4 | TU | 30.10.2018 | DW case studies: grocery store example, 4-step DW design process, surrogate keys, degenerate dimensions, junk dimension, inventory management, order management, OncoNet, and MEDAN case studies, DW bus architecture, DW bus matrix, conformed dimensions, role-playing, fact normalization, multi-valued dimensions, bridge tables | |
5 | TU | 06.11.2018 | DW changing dimensions: slowly changing dimensions, overwrite old values, versioning of rows, timestamping of rows, rapidly changing dimensions | |
FR | 09.11.2018 | Extract-transform-load: ETL process, ETL subsystems, building dimensions and fact tables, extract data, computing deltas, data transformation, normalization, type conversion, loading data, data quality, data cleansing. | ||
6 | TU | 13.11.2018 | SQL OLAP extensions: SQL query specification; processing SQL queries; GROUP BY extensions: ROLLUP; CUBE; GROUPING SETS; GROUPING_ID. | pdf, pdf, sql |
FR | 16.11.2018 | We visit SFSCON 2018! | ||
7 | No lectures this week | |||
8 | No lectures this week; lab on Friday will take place | |||
9 | TU | 04.12.2018 | SQL OLAP extensions: analytic/window functions, moving window aggregates, ranking, percentiles, nested aggregates, densification | pdf, sql |
TU | 07.12.2018 | DW pre-aggregation and view maintenance: selecting and using pre-aggregation, lattice framework, hierarchies and the lattice framework, cost model, greedy algorithm, optimal cases of the greedy algorithm, incremental view maintenance | ||
10 | TU | 11.12.2018 | Bitmap indexes: bitmap indexing, bitmap compression, run-length encoding, bit-sliced index, bitmap-encoded index, bitmapped join index | |
11 | TU | 18.12.2018 | NoSQL: motivation, new requirements, CAP theorem, BASE properties, NoSQL taxonomy, key-values stores, column stores, document stores, graph databases | |
WE | 19.12.2018 | MapReduce: programming model, architecture, map and reduce functions, MR extensions and optimizations, execution, limitations, MR implementations, Hadoop | pdf, [Li et al. 2014] | |
FR | 21.12.2018 | MR Execution and error handling: task scheduling in MR, job tracker, task tracker, locality, fault tolerance in MR, stragglers, speculative execution | ||
12 | TU | 08.01.2019 | MR Design Patterns: common pitfalls in designing MR algorithms, design patterns, categorization of design patterns, numerical summarizations, filtering patterns (filtering, Bloom filtering, top ten, distinct), join patterns (reduce side join, replicated join, composite join, Cartesian Product) | |
FR | 11.01.2019 | Distributed Systems: data independence, advantages of distributed systems, networking infrastructure, client-server, peer-to-peer networks, structured and unstructured P2P systems, data replication and consistency, replication policies, failure management | ||
13 | TU | 15.01.2019 | (08:00-12:00)Distributed hash tables: motivation, linear hashing, centralized linear hashing, distributed linear hashing, consistent hashing, finger tables, Chord | |
Main memory databases: definition, concurrency control and commit, data representation and compression, t-tree index, SAP HANA, Oracle TimesTen | ||||
FR | 18.01.2019 | Project presentation | ||
Project
The exercise part of the course consists in the elaboration of a project, which consists in the development and implementation of a Data Warehouse. The project is documented by a project report and concluded by a final presentation and demo.
The project shall be done in groups of 2-3 students. The individual group members might get different grades for the project, if it is evident that there were significant differences in the contribution to the project.
Sample project reports: sample project 1, sample project 2.
Assessment
The assessment of the course consists of two parts:
- theory (60%): assessed with a single written exam at the end;
- project (40%): assessed through a presentation, demo and final report about the project.
Here are some of the old exams: 30.01.2019, 19.09.2018, 30.01.2018, 10.07.2017, 31.01.2017, 02.02.2016, 15.09.2015, 06.07.2015, 28.01.2015, 10.06.2014, 10.02.2014.