Data Warehousing and Data Mining (DWDM)
Academic Year: 
2012/13, 1st semester 
Lecturer: 

Teaching assistant: 

Lectures: 
TU 10:3012:30, WE 08:3010:30, Room E411 
Exercises: 

Office hours: 
Gamper: WE 10:3012:30 or email arrangement 

Pawlik: MO and WE 13:0014:00 or email arrangement 
Objectives: Students will be enabled to understand and implement classical models and algorithms in data warehousing and data mining. They will learn how to analyze the data, identify the problems, and choose the relevant models and algorithms to apply. They will further be able to assess the strengths and weaknesses of various methods and algorithms and to analyze their behavior.
Syllabus
 Data warehousing
 SQL OLAP extensions
 Multidimensional Join
 Data warehouse performance
 Data Analysis and Uncertainty
 Classification and Prediction
 Cluster Analysis
 Association rules
Organization
The course organization is divided in two parts that are thaught in parallel: a data warehousing part and a data mining part. The exercises consist in doing a project alone or in groups of 23 students (more details below).Textbooks
Data Warehousing
 M. Golfarelli, S. Rizzi. Data Warehouse Design: Modern Principles and Methodologies. McGrawHill, 2009. (recommended!)
 R. Kimball, "The Data Warehouse Toolkit", 2nd edition.
 W. H. Inmon, "Building the Data Warehouse", 3rd edition.
 Selected papers
Data Mining
 Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", Second Edition, 2006
 Margaret H. Dunham, "Data Mining: Introductory and Advanced Topics", Prentice Hall, 2003, ISBN: 0130888923
 Simon Haykin, "Neural Networks: A Comprehensive Foundation", Prentice Hall, 2005, ISBN: 0131471392
 PangNing Tan, Michael Steinbach, and Vipin Kumar, "Introduction to Data Mining", Pearson Addison Wesley, 2005, ISBN: 0321321367
Lectures and Lecture Notes
The lecture notes for this course will be updated as we progress through the semester. The lecture notes of the DM part can be found in the syllabus of the data mining web page, while the lecture notes of the DW part can be found in the syllabus of the data warehousing web page.
Data Warehousing Part
1.  WE, 03.10.2012  Data warehousing: introduction, business intelligence, data integration, OLTP vs. OLAP, methodological framework, DW definition [slides] 
2.  WE, 10.10.2012  Data warehousing: multidimensional modeling, cubes, facts, dimensions, DW design [slides] 
3.  WE, 17.10.2012  Data warehousing: more about dimensions, star scheme, snowflake scheme, DW implementation, DW applications [previous lecture] 
4.  WE, 24.10.2012  Data warehousing: case studies [slides] 
5.  WE, 31.10.1012  SQL OLAP extensions: SQL query expression, crosstabs, group by extensions, rollup, cube, grouping sets [slides] [sql] 
6.  WE, 07.11.2012  SQL OLAP extensions: analytic/window functions, ranking, moving window aggregates, densification [slides] 
7.  WE, 14.11.2012  Generalized multidimensional join: GMDJ definition, evaluation algorithms [slides] [Akinde et al. 11] [Chatziantoniou et al. 01] [Akinde et al. 02] [sql] 
8.  WE, 21.11.2012  Generalized multidimensional join: subqueries, optimization rules, reducing range to point queries, late initialization of result table, distributed evaluation [slides] 
9.  WE, 28.11.2012  DW performance: preaggregation, lattice framework, view selection [slides] [Harinarayan et al. 96] [Wu and Buchmann 98] 
10.  WE, 05.12.2012  DW performance: view selection, view maintenance, bitmap indexing [previous lecture] 
11.  WE, 12.12.2012  ExtractTransformLoad: ETL process, building dimensions and fact tables, extract, transform, load. [slides] 
12.  WE, 19.12.2012  Advanced modeling: changing dimensions, largescale dimensional modeling, project management. [slides] 
Data Mining Part
1.  Tuesday, 09.10.2012  Data Mining: Introduction [slides] 
2.  Tuesday, 16.10.2012  Data Mining: Getting to know your data [slides] 
3.  Tuesday, 23.10.2012  Data Mining: Statistics [slides] 
45.  Tuesday, 06.11.2012 and 13.11.2012  Data Mining: Pattern Mining [slides] 
6.  Tuesday, 20.11.2012  Data Mining: Clustering: Partitioning Methods[slides] 
7.  Tuesday, 27.11.2012  Data Mining: Clustering: Hierarchical Methods [slides] 
8.  Tuesday, 04.12.2012  Data Mining: Densitybased Methods and High Dimensional Clustering [slides] 
9.  Tuesday, 11.12.2012  Data Mining: Classification: Decision Trees [slides] 
10.  Tuesday, 08.01.2013  Data Mining: Classification: Bayes Classifier [slides] 
1112.  Tuesday, 15.01.2013  Data Mining: Classification: Rulebased Classification, Lazy Learners, Prediction, Evaluation (to be updated next week) [slides] 
Projects
Description: During the semester, students do a project that is divided in two modules. Each module lasts for six weeks and can be done either in the area of DW or DM. The following options exist:
 2 modules in DW;
 2 modules in DM;
 1 module in DW and 1 module in DM.
The project can be done alone or in groups of 23 students.
More details will be explained during in first exercise on Tuesday, October 9, 2012.
Deadline for the decision about the project and the groups: October 19 (send an email to both Mouna Kacimi and Matteusz Pawlik)
Data Warehousing Part
IntroductionModule 1
[NEW] The deadline for Task 6 has been extended till Friday, 30.11.2012, 23:59.
Module 1 Task 1 Module 1 Task 2 Module 1 Task 3 Module 1 Task 4 Module 1 Task 5 Module 1 Task 6Module 2
Project requirements and guidelinesData MiningPart
Part1: Task1Organization: you can work alone or team up with other students (2students groups are preferred)
Deadline for the deliverable: November 2, 2012 at 23:59
Part1: Task2Additional references: Apriori Algorithm, FPgrowth Algorithm
Organization: you can work alone or team up with other students (2students groups are preferred)
Deadline for the deliverable: November 26, 2012 at 23:59
Part2: Task1 & Task2Additional references: KMEANS, DBScan, Birch
Organization: you can work alone or team up with other students (2students groups are preferred)
Deadline for the first task: December 21, 2012 at 23:59
Deadline for the first task: January 15, 2013 at 23:59
Exam
The assessment of the course consists of two parts:
 project part (40%): assessed through a presentation, demo and a final report;
 theory part (60%): assessed with a written exam (multiple choice).
Both parts must be positive to pass the exam. A successful project is required to be admitted to the theoretical exam.
A successful project remains valid even if the student fails the theoretical exam. If a student fails the project part, he has to do a new project for the next exam session. In this case, the teaching assistant does not guarantee support for supervising the students.
The written exam is a closed book exam. The only resources allowed to use are blank paper, pens, and your head.
Here is an example of an exam [pdf].
Here is the correction for the DM part questions Data Mining