your website name here

My main research interest is Knowledge Representation and Database. My PhD topic is about Data Exchange. Besides, I’m also very interested in Machine Learning and its applications.

Unique Solutions in Data Exchange

Data exchange is the problem of transforming data structured according to a source schema into data structured according to a target schema, via a mapping specified by rules in the form of source-to-target tuple generating dependencies. In this context, given a source instance and a mapping, there might be more than one valid target instance that satisfies the mapping. This issue contradicts the main goal of exchanging data, namely to have a materialised target instance that can be used to answer queries over the target schema without reference to the original source instance. We introduce and solve the novel problem of definability abduction, which aims at finding extensions to the initial schema mappings to guarantee the uniqueness of the materialised target instance. We consider several semantic criteria to select reasonable extensions and provide provably sound and complete algorithms to generate these additions. We also do a complexity analysis in different data exchange settings, also with source and target dependencies.

Query Reformulation under Ontologies and DBoxes

We study a general framework for query rewriting in the presence of an arbitrary first-order logic ontology over a database signature. The framework supports deciding the existence of a safe-range first-order equivalent reformulation of a query in terms of the database signature, and if so, it provides an effective approach to construct the reformulation based on interpolation using standard theorem proving techniques (e.g., tableau). Since the reformulation is a safe-range formula, it is effectively executable as an SQL query.

Predicting gene function using similarity learning

Computational methods that make use of heterogeneous biological datasets to predict gene function provide a cost-effective and rapid way for annotating genomes. A common framework shared by many such methods is to construct a combined functional association network from multiple networks representing different sources of data, and use this combined network as input to network-based or kernel-based learning algorithms. In these methods, a key factor contributing to the prediction accuracy is the network quality, which is the ability of the network to reflect the functional relatedness of gene pairs. To improve the network quality, a large effort has been spent on developing methods for network integration. These methods, however, produce networks, which then remain unchanged, and nearly no effort has been made to optimize the networks after their construction.

Filtering Image-Based Spam using SVMs and edge-based features

Spam e-mail with advertisement text embedded in images presents a great challenge to anti-spam filters. We describe a fast method to detect image-based spam e- mail. Using simple edge-based features, the method computes a vector of similarity scores between an image and a set of templates. This similarity vector is then used with support vector machines to separate spam images from other common categories of images. Our method does not require computationally expensive OCR or even text extraction from images. Empirical results show that the method is fast and has good classification accuracy.

Publications [on DBLP or Google Scholar]