Advanced Data Management Technologies (ADMT)
|Academic Year:||2017/18, 1st semester|
|Lecturer:||Johann Gamper||Lab instructor:||Anton Dignös|
|Lectures:||WE 14:00-16:00, FR 08:30-10:30, Room E220
||Labs:||TU 14:00-16:00, Room E220|
|Office hours:||WE 13:00-14:00, FR 11:00-12:00 or email arrangement||Timetable:||here|
Objectives: Introduce advanced data management technologies that go beyond traditional (relational) database management systems and train students to evaluate the advantages and disadvantages of such technologies in different application contexts.
Prerequisites: Students should be familiar with basic concepts in databases (including relational databases, SQL, and relational algebra) and algorithms, as well as having good pogramming skills.
- Data Warehousing and Business Intelligence
- Extract, Transform, Load
- Column-oriented Databases
- Distributed Databases
- Distributed Data Structures
- Main-memory Databases
References: There is no single textbook that covers the entire course. The course material is collected from various textbooks and research papers. They are indicated on the slides.
More details are in the course presentation form.
|1||WE||04.10.2017||Introduction: administration and organisation, the DB field, new trends in computing, motivation and need for advanced ADMT|
|FR||06.10.2017||Basic concepts of BI and DW: business intelligence, from data to information, BI pyramid, BI key problems, DW definition, DW architecture, data integration, multidimensional model, OLAP vs. OLTP|
|2||WE||11.10.2017||Building a DW: methodological framework, DW design, top-down vs. bottom-up design, DW life-cycle, DW project management, BI competence center|
|FR||13.10.2017||DW requirements analysis and conceptual design: user needs, conceptual design, facts, dimensions, measures, dimensional fact model, building dimensional fact schemata|
|3||WE||18.10.2017||DW logical design: logical design, multidimensional model, star schema, snowflake schema, fact/event types, additivity of measures, different types of measures, DW applications, DW implementation|
|FR||20.10.2017||DW case studies: grocery store example, 4-step DW design process, surrogate keys, degenerate dimensions, junk dimension|
|4||WE||25.10.2017||DW case studies (contd.): inventory management, order management, OncoNet, and MEDAN case studies, DW bus architecture, DW bus matrix, conformed dimensions, role-playing, fact normalization, multi-valued dimensions, bridge tables|
|6||WE||08.11.2017||DW changing dimensions: slowly changing dimensions, overwrite old values, versioning of rows, timestamping of rows, rapidly changing dimensions|
|FR||10.11.2017||We visit the SFScon 2017: https://www.sfscon.it|
|7||WE||15.11.2017||Extract-transform-load: ETL process, ETL subsystems, building dimensions and fact tables, extract data, computing deltas, data transformation, normalization, type conversion, loading data, data quality, data cleansing.|
|FR||17.11.2017||SQL OLAP extensions: SQL query specification; processing SQL queries; GROUP BY extensions: ROLLUP; CUBE; GROUPING SETS; GROUPING_ID.||pdf, pdf, sql|
|8||WE||22.11.2017||Research talk by Natalija Kozmina, University of Latvia: Research projects in Data Warehousing and Business Intelligence|
|9||WE||29.11.2017||SQL OLAP extensions: analytic/window functions, moving window aggregates, ranking, percentiles, nested aggregates, densification||pdf, sql|
|FR||01.12.2017||DW pre-aggregation and view maintenance: selecting and using pre-aggregation, lattice framework, hierarchies and the lattice framework, cost model, greedy algorithm, optimal cases of the greedy algorithm, incremental view maintenance|
|10||WE||06.12.2017||Bitmap indexes: bitmap indexing, bitmap compression, run-length encoding, bit-sliced index, bitmap-encoded index, bitmapped join index|
|11||WE||13.12.2017||NoSQL: motivation, new requirements, CAP theorem, BASE properties, NoSQL taxonomy, key-values stores, column stores, document stores, graph databases|
|FR||15.12.2017||Excursion to GKN Sinter Metals|
|12||WE||20.12.2017||MapReduce: programming model, architecture, map and reduce functions, MR extensions and optimizations, execution, limitations, MR implementations, Hadoop||pdf, [Li et al. 2014]|
|13||WE||10.01.2018||MR Execution and error handling: task scheduling in MR, job tracker, task tracker, locality, fault tolerance in MR, stragglers, speculative execution; MR Design Patterns: common pitfalls in designing MR algorithms, design patterns, categorization of design patterns, numerical summarizations, filtering patterns (filtering, Bloom filtering, top ten, distinct), join patterns (reduce side join, replicated join, composite join, Cartesian Product)|
|FR||12.01.2018||MR Design Patterns: common pitfalls in designing MR algorithms, design patterns, categorization of design patterns, numerical summarizations, filtering patterns (filtering, Bloom filtering, top ten, distinct), join patterns (reduce side join, replicated join, composite join, Cartesian Product)|
|14||MO||15.01.2018||Distributed Systems: data independence, advantages of distributed systems, networking infrastructure, client-server, peer-to-peer networks, structured and unstructured P2P systems, data replication and consistency, replication policies, failure management|
|TU||16.01.2018||Distributed hash tables: motivation, linear hashing, centralized linear hashing, distributed linear hashing, consistent hashing, finger tables, Chord|
|WE||17.01.2018||Main memory databases: definition, concurrency control and commit, data representation and compression, t-tree index, SAP HANA, Oracle TimesTen|
The exercise part of the course consists in the elaboration of a project, which shall be done in groups of 2 students. The project consists in the development and implementation of a Data Warehouse.
The project is documented by a project report and concluded by a final presentation and demo.
- WE, 11.10.2017: Presentation of the project and group building.
The assessment of the course consists of two parts:
- theory (60%): assessed with a single written exam at the end;
- project (40%): assessed through a presentation, demo and final report about the project.