MSc Thesis
These web pages gives you access to a number of proposals for theses in the DIS group.
The DIS group puts a strong emphasis on the applications of the database theory. All our internships are very application oriented, and must verify the investgated methods on both synthetic (a must) and real world datast (whenever possible). We also encourage the students to formalize their experimental findings and provide analytical results.
You are also welcome to suggest additional topics that fall within the general area of databases. The Database group is conducting research in the following areas:
- Temporal Databases: Michael H. Boehlen, Johann Gamper, Romans Kasperovics
- Visual Data Mining: Arturas Mazeika, Michael H. Boehlen, Andrej Taliun
- Approximate string and tree matching: Nikolaus Augsten, Johann Gamper
- Approximate string selectivity: Arturas Mazeika
- OLAP, Data Warehousing, Data Mining: Arturas Mazeika, Michael H. Boehlen
- Web ontologies, e-goverment: Johann Gamper, Nikolaus Augsten
- Recommender Systems and Personalization: Francesco Ricci
- Temporal OLAP, temporal multidimensional databases: Igor Timko
- Probabilistic OLAP, probabilistic multidimensional databases: Igor Timko
Proposals
- Experimenatal Evaluation of Tree Edit Distance Algorithms
(N. Augsten) NEW!
The tree edit distance computes the similarity between ordered, labeled trees. Intuitively, it counts the number of node edit operations (rename, delete, insert) required to transform one tree into the other. The goal is to implement recent algorithms for the tree edit distance and compare them experimentally. -
Probabilistic Temporal Multidimensional Data Model and Algebra
(I. Timko)
Logical multidimensional models and algebras are alternatives to relational model/algebras. They are used in On-Line Analytical Processing (OLAP). The goal of this thesis is to develop a model/algebra for temporal data that is at the same time probabilistic (e.g., future, predicted, time-varying properties of traffic jams (size, speed, location)). - Local Similarity Metrics for Collaborative Filtering (F.Ricci)
The core component of a collaborative filtering recommender system is the user-to-user similairty metric. The goal of this project is to investigate the advantages provided by of a local definition of the similairty metric in term of accuracy of the prediction. - Recommending Generalized Products in Collaborative Filtering (F.Ricci)
Collaborative filtering systems cannot provide reliable recommendations when the ratings' matrix is sparse. The goal of this thesis is to design an extension of the CF methodology that can support the recommendation of categories of products rather than single products and overcome the sparsity problem. - Recommendation by Proposing and Complex Products Clustering (F.Ricci)
Recommendation by proposing is conversational methodology that still suffer from major usability and computational limitations. The goal of this thesis is to design, implement and test in a real operational recommender system, integrated in a major tourism portal, a novel approach based on hierarchical clustering of complex producst. - Stable Marriage Algorithm with Ties
(N. Augsten)
The stable marriage algorithm is used in data integration to match corresponding objects in a database. The goal is to implement the algorithm and to evaluate efficiency and effectiveness. The effectiveness is evaluated in a given data integration setting. - Efficient Implementation of the
TMDA Operator (J. Gamper)
TMDA is an expressive aggregation operator for temporal database systems, which generalizes a variety of previously proposed aggregation operators. The aim of this thesis is to work on an efficient implementation for this operator. - Compression of the VSol
Summary Structure
(A. Mazeika)
VSol computes a small summary structure for a text database, and uses the summary structure to answer selectivity for a given string. The aim of the thesis is to compress the summary structure by identifying blocks of information that is repeated multiple times in the structure. - APDF
Method with Subspace Clustering (A. Mazeika)
APDF method is a scalable generalization of histograms. The aim of the thesis is to combine the APDF method with the subspace clustering techniques and further reduce the computational complexity in subspaces of low dimensionality of the data. - Shape Invariant APDF Method (A. Mazeika)
APDF method is a scalable generalization of histograms. The aim of the thesis is to come up with a variant of the APDF tree that is invariant to the shape of the structures in the dataset. - Visual Data
Analysis of the Sepia Summary Structures (A. Mazeika)
Sepia computes PPD and local histograms for a text database, and uses the histograms to estimate the selectivity for a given query string. The aim of the thesis is to evaluate the histograms visually and experimentally. - Visualization of Text Clusters
(A. Mazeika)
Extensions of VSol allows to compute clusters in the text databases. The aim of the thesis is to visualize the clusters with a help of HITMD or any other technique and analyze the clusters visually. - Separation of
Intersecting Clusters with Density Based Clustering (A. Mazeika)
Density based clustering clusters databases efficiently independent of the shape of the clusters in the database, but fails with overlapping clusters. The aim of the thesis is to modify the density based clustering so it clusters the overlapping clusters correctly. - Ordering of
Categorical Domains with a help of the IB Method (A. Mazeika)
IB method is a robust categorical clustering technique. The aim of the thesis is to apply the IB method to define an ordering for the categorical attributes.
