Data Warehousing and Data Mining (DWDM)

Academic Year:

2012/13, 1st semester

Lecturer:

Johann Gamper and Mouna Kacimi

Teaching assistant:

Matteusz Pawlik

Lectures:

TU 10:30-12:30, WE 08:30-10:30, Room E411

Exercises:

Office hours:

Gamper: WE 10:30-12:30 or email arrangement

 

Pawlik: MO and WE 13:00-14:00 or email arrangement


Home | Lectures | Projects | Exam


Objectives: Students will be enabled to understand and implement classical models and algorithms in data warehousing and data mining. They will learn how to analyze the data, identify the problems, and choose the relevant models and algorithms to apply. They will further be able to assess the strengths and weaknesses of various methods and algorithms and to analyze their behavior.

Syllabus

Organization

The course organization is divided in two parts that are thaught in parallel: a data warehousing part and a data mining part. The exercises consist in doing a project alone or in groups of 2-3 students (more details below).

Textbooks

Data Warehousing

Data Mining

Lectures and Lecture Notes

The lecture notes for this course will be updated as we progress through the semester. The lecture notes of the DM part can be found in the syllabus of the data mining web page, while the lecture notes of the DW part can be found in the syllabus of the data warehousing web page.

Data Warehousing Part

1. WE, 03.10.2012 Data warehousing: introduction, business intelligence, data integration, OLTP vs. OLAP, methodological framework, DW definition [slides]
2. WE, 10.10.2012 Data warehousing: multidimensional modeling, cubes, facts, dimensions, DW design [slides]
3. WE, 17.10.2012 Data warehousing: more about dimensions, star scheme, snowflake scheme, DW implementation, DW applications [previous lecture]
4. WE, 24.10.2012 Data warehousing: case studies [slides]
5. WE, 31.10.1012 SQL OLAP extensions: SQL query expression, crosstabs, group by extensions, rollup, cube, grouping sets [slides] [sql]
6. WE, 07.11.2012 SQL OLAP extensions: analytic/window functions, ranking, moving window aggregates, densification [slides]
7. WE, 14.11.2012 Generalized multi-dimensional join: GMDJ definition, evaluation algorithms [slides] [Akinde et al. 11] [Chatziantoniou et al. 01] [Akinde et al. 02] [sql]
8. WE, 21.11.2012 Generalized multi-dimensional join: subqueries, optimization rules, reducing range to point queries, late initialization of result table, distributed evaluation [slides]
9. WE, 28.11.2012 DW performance: pre-aggregation, lattice framework, view selection [slides] [Harinarayan et al. 96] [Wu and Buchmann 98]
10. WE, 05.12.2012 DW performance: view selection, view maintenance, bitmap indexing [previous lecture]
11. WE, 12.12.2012 Extract-Transform-Load: ETL process, building dimensions and fact tables, extract, transform, load. [slides]
12. WE, 19.12.2012 Advanced modeling: changing dimensions, large-scale dimensional modeling, project management. [slides]

Data Mining Part

1. Tuesday, 09.10.2012 Data Mining: Introduction [slides]
2. Tuesday, 16.10.2012 Data Mining: Getting to know your data [slides]
3. Tuesday, 23.10.2012 Data Mining: Statistics [slides]
4-5. Tuesday, 06.11.2012 and 13.11.2012 Data Mining: Pattern Mining [slides]
6. Tuesday, 20.11.2012 Data Mining: Clustering: Partitioning Methods[slides]
7. Tuesday, 27.11.2012 Data Mining: Clustering: Hierarchical Methods [slides]
8. Tuesday, 04.12.2012 Data Mining: Density-based Methods and High Dimensional Clustering [slides]
9. Tuesday, 11.12.2012 Data Mining: Classification: Decision Trees [slides]
10. Tuesday, 08.01.2013 Data Mining: Classification: Bayes Classifier [slides]
11-12. Tuesday, 15.01.2013 Data Mining: Classification: Rule-based Classification, Lazy Learners, Prediction, Evaluation (to be updated next week) [slides]

Projects

Description: During the semester, students do a project that is divided in two modules. Each module lasts for six weeks and can be done either in the area of DW or DM. The following options exist:

The project can be done alone or in groups of 2-3 students.

More details will be explained during in first exercise on Tuesday, October 9, 2012.

Deadline for the decision about the project and the groups: October 19 (send an email to both Mouna Kacimi and Matteusz Pawlik)

Data Warehousing Part

Introduction

Module 1

[NEW] The deadline for Task 6 has been extended till Friday, 30.11.2012, 23:59.

Module 1 Task 1 Module 1 Task 2 Module 1 Task 3 Module 1 Task 4 Module 1 Task 5 Module 1 Task 6

Module 2

Project requirements and guidelines

Data MiningPart

Part1: Task1

Organization: you can work alone or team up with other students (2-students groups are preferred)

Deadline for the deliverable: November 2, 2012 at 23:59

Part1: Task2

Additional references: Apriori Algorithm, FP-growth Algorithm

Organization: you can work alone or team up with other students (2-students groups are preferred)

Deadline for the deliverable: November 26, 2012 at 23:59

Part2: Task1 & Task2

Additional references: KMEANS, DBScan, Birch

Organization: you can work alone or team up with other students (2-students groups are preferred)

Deadline for the first task: December 21, 2012 at 23:59

Deadline for the first task: January 15, 2013 at 23:59

Exam

The assessment of the course consists of two parts:

Both parts must be positive to pass the exam. A successful project is required to be admitted to the theoretical exam.

A successful project remains valid even if the student fails the theoretical exam. If a student fails the project part, he has to do a new project for the next exam session. In this case, the teaching assistant does not guarantee support for supervising the students.

The written exam is a closed book exam. The only resources allowed to use are blank paper, pens, and your head.

Here is an example of an exam [pdf].

Here is the correction for the DM part questions Data Mining