Language and Communication Technologies Colloquia -Student Day

The "LCT Student Day" is a getting-together-event to be updated on the research projects carried out by young researchers working or previously studying in Bolzano, Trento or Rovereto.


As part of this event there will also be the "KRDB & IBM Best Thesis Awards" Ceremony. Join us to congratulate with the winner!!

Where and When: Bolzano, Free University of Bozen-Bolzano, Via Sernesi 1, 15th of May 2008,


The winners of the Awards are: Barbara Plank and Evgeny Kharlamov. Congratulations!!! Here you can find the UniNews in German and here its translation into English made by Barbara.

Barbara Plank. European Masters Program in LCT, Free University of Bozen-Bolzano (first year), Universiteit van Amsterdam (second year). Sub-domain driven parsing. Thesis main supervisor: Khalil Sima'an (Universiteit van Amsterdam).

Evgeny Kharlamov. European Masters Program in CL, Dresden University of Technology (first year), Free University of Bozen-Bolzano (second year). Thesis supervisors: Diego Calvanese, Werner Nutt.




Speaker Affiliation Title Abstract Posters
Andrea Abel and Stefanie Anstei EURAC Approaches to Computational Lexicography for German Varieties Abstract Poster
Luciana Benotti LORIA/INRIA Tacit sensing in a non-traditional conversational system Abstract Poster
Raffaella Bernardi, Paolo Dongilli and Daniele Gobbetti KRDB-FUB MIVaS Abstract
Raffaella Bernardi, Paolo Buoso and Daniele Gobbetti KRDB and Library (FUB) CACAO Abstract
Katja Ignatova Technische Universiteit Darmstadt,
Ubiquitous Knowledge Processing Lab
Question Answering for E-Learning Abstract Poster
Gerhard Kremer CIMeC Cognitively Salient Semantic Relations in and for Concept Description Abstract Poster
Manuel Kirschner and Raffaella Bernardi KRDB-FUB A Task/Entity-Based Context Model for Answering Follow-up Questions Abstract
Barbara Plank Alfa informatica, Faculty of Arts,
University of Groningen
Domain Adaptation of Syntactic Disambiguation Models Abstract Poster
Marija Slavkovik Computer Science and Communications
University of Luxembourg
Logic Reasoning in Question Answering Abstract Poster
Sara Tonelli Ca' Foscari University/FBK Building Italian FrameNet through frame information transfer from English to Italian Abstract
Camilo Thorne KRDB-FUB Expressing Formal Queries over DL-Lite Ontologies with Controlled English Abstract





Approaches to Computational Lexicography for German Varieties
Andrea Abel and Stefanie Anstei, EURAC ^


Corpora built for linguistic varieties of a pluricentric language such as German are an indispensable resource for a detailed and systematic variety comparison and dictionary development. We present desiderata and suggestions as well as methods from computational linguistics to systematically apply variety corpora for the enrichment, i.e. confirmation, extension and generation, of lexical entries in distinctive variant dictionaries for German. Examples are those variant dictionaries developed by Ammon et al. (2004) and Abfalterer (2007), where we focus on the South Tyrolean German language. On the one hand, we conducted a systematic frequency analysis in newspaper variety corpora for approved lists of South Tyrolean special vocabulary in order to possibly refine corresponding dictionary entries with corpus evidence. On the other hand, we filtered the list of words of our South Tyrolean corpus which could not be lemmatised by a tool developed for the variety in Germany. After removing special vocabulary collected for the South Tyrolean variety in other projects (e.g. legal terms), the remaining list was manually checked for possible new variant dictionary entries, thus - as an innovative variety corpus lexicographic approach - also automatically filtering a huge amount of data to extract only relevant data to be investigated in detail. In addition, we semi-automatically extracted lexical cooccurrences of our two newspaper corpora and compared their frequencies – with the assumption that those cooccurrences are worth being more closely investigated that have high frequency in the South Tyrolean corpus and very low frequency in the corpus from Germany. With these three methods we were not only able to refine dictionary entries for South Tyrolean German, but also to add new ones. The findings on variants can be re-used for further corpus annotation resulting in again better resources for computational variant lexicography of the kind described, which is also to be extended to more complex linguistic levels.)



Tacit sensing in a non-traditional conversational system
Luciana Benotti, INRIA/LORIA ^


When interlocutors are engaged in situated dialogue their informational states evolve not only through dialogue acts but also through physical and sensing acts. All these acts can be performed either explicitly or tacitly during the interaction. In most dialogues, even when acts are performed tacitly, their execution can be inferred from the subsequent acts. I am studying the interaction among dialogue, physical and sensing acts in the framework of a non-traditional conversational system using a non-traditional automated planner. The planner I am using is able to find plans in the presence of incomplete knowledge and sensing, the most common setup in situated conversation.



CACAO
Raffaella Bernardi, Paolo Buoso and Daniele Gobbetti, KRDB and Library (FUB) ^



CACAO is a 24th month targeted project supported by the eContenPlus Programme of the European Commissions, started on the 1st of December 2007. The aim of CACAO is to provide an infrastructure to the end-user that enables him/her to type queries in his/her own language and retrieve documents and objects in any available language. The sound integration of CACAO's infrastructure designed for multilingual purposes with current digital library and catalogue systems will be reached by coupling natural language processing techniques with existing information retrieval systems. CACAO will deliver tools for the maintenance of multilingual resources in the retrieval context.



MIVaS
Raffaella Bernardi, Paolo Dongilli, Daniele Gobbetti, KRDB-FUB ^



MIVaS is a project carried out by the “Comitato provinciale di Valutazione per la Qualita' del Sistema scolastico” and the KRDB research group (Faculty of Computer Science) in collaboration with DiPED (University of Roma III). It aims at answering the following u questions: 1. How did the Educational System in Alto Adige in the Italian Schools has changed from the 1966 till now? 2. Are there changes in the linguistic competences of the students from 1966 till now? 3. Which are the linguistic competences of the students of class A vs. class C? Did they changes within the years? 4. How are the answers to the above questions connected? To this end, essays from Secondary Schools in Alto Adige have been collected, digitialized and annotated. We will report on the statistical analysis about lexical variations within the years.



Question Answering for E-Learning
Katja Ignatova, Technische Universiteit Darmstadt, Ubiquitous Knowledge Processing Lab ^


Information overload is a well-known problem which also affects learning, since huge amounts of learning material are nowadays available in different formats and from different sources. This makes it all the harder for the learner to access information in a fast and direct way. In the QA-EL project we investigate new applications of dynamic lexical-semantic resources for information search in eLearning. Our goal is to provide uniform access to both institutional and informal knowledge resources, whereby precise and short aggregated answers are supplied to the learner. Our system architecture focuses on the integration of information extracted from different knowledge repositories for the targeted needs of Question Answering in eLearning 2.0. Classical linguistically motivated resources such as GermaNet are coupled with lexical-semantic information extracted from collaborative resources like Wikipedia, and put into service for processing heterogeneous institutional and other Web 2.0 eLearning content.



Cognitively Salient Semantic Relations in and for Concept Description
Gerhard Kremer, CIMeC ^


Presenting words semantically related to a lexical entry in an electronic dictionary is particularly useful in a language learning environment. However, there is lack of evidence of which semantic relation types are cognitively salient in order to use them for (semi-) automatic extraction of word field candidates from corpora. As a first step within this project, semantic relation types are collected from a concept description experiment with German and Italian participants.



A Task/Entity-Based Context Model for Answering Follow-up Questions
Manuel Kirschner and Raffaella Bernardi, KRDB-FUB ^


In Interactive Question Answering (IQA), users frequently pose Follow-Up questions. In fact, giving the user the ability to ask follow-ups is often considered the key advantage over classical QA. An IQA system needs a suitable representation of the dialogue context to help resolve context-dependent Follow-Up questions. Computational linguists have tried to identify salient relations holding between the previous IQA dialogue and the Follow-Up. To this end, we have been experimenting with a Task/Entity-based model of the dialogue context. In this talk, I will give an overview of the current implementation of our IQA application, focusing on how follow-up questions are processed. I will describe our current experiments with learning parameters of our question answering algorithm from actual dialogue data that we have collected.



Domain Adaptation of Syntactic Disambiguation Models
Barbara Plank, Alfa informatica, Faculty of Arts, University of Groningen ^


A major challenge in Natural Language Processing (NLP) is the inherent ambiguity of Natural Language. In parsing, a specific area of NLP, the ambiguity is characterized by multiple alternative syntactic analyses for a given input sentence. A parser has to cope with this difficulty, and has to choose among the various alternatives. Usually, the framework of probability theory and statistics is employed as modeling tool, leading to statistical parsing. Modern statistical parsers are trained on large annotated corpora (treebanks) and their parameters are estimated to reflect properties of the training data. Thus, a disambiguation component bases its decisions on the treebank and it will be successful as long as the treebank it was trained on is representative for the input the model gets. Hence, as might be expected, performance degrades when a model trained on one domain is applied to another domain. For example, a parser trained on newspaper text can be expected to have reasonable performance if it is applied to newspaper sentences. However, the model will be less adequate to analyze, say, spoken data. Hence, a well known problem arising from the domain dependence of a parsing systems is their portability. A simple solution to improve performance on a new domain is to construct a parser specifically for that domain. However, this amounts to hand-labeling a reasonable amount of training data which is clearly very expensive and leads to an unsatisfactory solution. In alternative, domain adaptation techniques try to leverage a small amount of already existing annotated data or to use unlabeled data from one domain to parse data from a different domain. The current project explores techniques for domain adaptation for Alpino, a wide-coverage analyzer for Dutch. In an initial experiment, a supervised approach to domain-adaptation has been examined, where a small amount of available in-domain data was assumed. In future research we will study techniques to gauge to what extent we can leverage either existing labeled data or also unlabeled data. This includes a more thorough research on the precise extent of the notion of domains. It comprises the study of domain characteristics that might be exploited for the task of parsing.



Logic Reasoning in Question Answering
Marija Slavkovik, Computer Science and Communications, University of Luxembourg ^


The improvement of current Question Answering (QA) systems can lie in finding ways to support the traditional statistic approach to QA with logic reasoning. In my thesis I suggest one way of supporting an Interactive Question Answering system with logic reasoning. As a case study we make an overview of BoB, a chatter-bot which interactively answers questions over the library domain. I suggest an architecture of a Logic Support Unit (LSU) for BoB. The LSU will support BoB's work by performing verification or refutation of the retrieved answers and extraction of the specific answer(s) from the verified answers. This problem is represented in terms of Answer Set Programming. The logic programs (representing the question and the tested answer) are built from the first-order representations of the questions and answers. The thesis contains a description on how to analyze the answer sets of these programs to verify or refute an answer and to extract a specific answer. For the purpose of building semantic representations for natural language, I use Boxer which builds first-order logic formulas from parsed natural language sentences. To allow for efficient reasoning, natural language fragments are defined whose sentences have decidable first-order representations. As a general conclusion, we find that the ASP framework has many features which can be used for the task of supporting IQA with deep analysis. The most promising of these features we present as Future Work of the thesis.



Building Italian FrameNet through frame information transfer from English to Italian
Sara Tonelli, Ca' Foscari University/FBK ^


The creation of English FrameNet started in Berkeley 10 years ago and aimed at developing an on-line lexical resource for English, based on frame semantics and supported by corpus evidence. The project is still ongoing and tries to document the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses, through computer-assisted annotation of example sentences and automatic display of the annotation results. In recent years, other research projects have been seeking to produce comparable frame-semantic lexicons for other languages and to devise means of automatically labeling running text with semantic frame information. Since manual annotation is expensive and time-consuming, I am developing Italian FrameNet using automatic labelling tecniques as much as possible. In particular, I implemented a projection algorithm for transferring frame-semantic information from English to Italian texts and tested it on a portion of the Europarl corpus. In my first experiments, I could point out typical features of the Italian language as regards frame-semantic annotation, in particular I had to deal with the peculiarities of Italian that at the moment make the projection task more difficult than for German or Swedish. In general, the approach seems to be very promising and I plan to exploit the advantages of frame information transfer with other parallel corpora and with automatically translated texts.



Expressing Formal Queries over DL-Lite Ontologies with Controlled English
Camilo Thorne, KRDB-FUB ^


We propose to characterize the computational complexity of answer- ing questions in ontology-mediated controlled language interfaces to structured data sources by expressing ontology-based data access in controlled English. This means: compositionally mapping a controlled subset of English to knowledge bases and for- mal queries for which the computational complexity is well-known. In the present paper, we extend this approach to conjunctive queries and to conjunctive queries with aggregate functions.