# BSc Thesis Proposals

Supervisor: Nikolaus Augsten

### Stable Marriage Algorithm with Ties(assigned — Alexander Gruber)

Given a set of man and a set of women. Each person has a preference list of the folks of the opposite gender. A pair of people of opposite genders that like each other better than their respective spouses is an instability. How can we pair up all the folks so there are no such instabilities?

This problem was first studied by Gale and Shapley. They showed that there always exists at least one stable solution and they gave an O(n2) algorithm to find one. The algorithms assumes that there are no ties in the preference list. If we allow ties in the preference list, the following solutions can be distinguished:

• Weakly Stable Matching: The matching M is weakly stable if no couple (x,y) can be formed such that both, x an y, strictly prefer each other to the partner in M.
• Strongly Stable Matching: The matching M is strongly stable if no couple (x,y) can be formed such that x strictly prefers y to his/her partner in M and y either strictly prefers x to his/her partner in M or is indifferent between them.
• Super-Stable Matching: The matching M is super-stable if no couple (x,y) can be formed such that both, x an y, either strictly prefer each other to the partner in M or are indifferent.

Algorithms for each of the problems are given by Gale and Shapley and Irving. The algorithms are used in data integration to match data items in different databases that correspond to the same real world object (e.g., a residential address stored in the databases of two different companies). Exact matches fail, because the data items are not equal (but only similar). For each item we assume a preference list of items in the other database. The preference list may contain ties.

The goal is to implement the algorithms and evaluate efficiency and effectiveness. The effectiveness is evaluated in a given data integration setting.

### Approximate XML Joins Using Reference Sets (finished — Andreas Villscheider)

While exact XML joins match equal documents, approximate XML joins match similar documents. XML can be represented as ordered, labeled trees. Distance measures for trees are applied to compute the similarity between documents.

A well known distance function for trees is the tree edit distance, which is defined as the minimum cost sequence of edit operation (node insertion, node deletion, and label change) that transforms one tree into another. Unfortunately the best known algorithms have runtime at least O(n2).

Guha et al. propose a framework for approximate XML joins based on the tree edit distance. They give upper and lower bounds for the tree edit distance and use reference sets to take advantage of the fact that the tree edit distance is a metric. This way they can reduce the number of distance computations in a join and thus improve the efficiency.

The goal of the thesis is to implement the framework proposed by Guha et al. and evaluate it on synthetic and real data.

## References

[Gale and Shapley]
D. Gale and L. S. Shapley. College admissions and the stability of marriage. Amer. Math. Monthly, Vol 69, pp. 9-15, 1962.
[Guha et al.]
S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, and T. Yu. Approximate XML joins. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 287-298, Wisconsin, 2002. ACM Press.
[Irving]
R. W. Irving. Stable marriage and indifference. Discrete Applied Mathematics, 48:261-272, 1994.