Language and Communication Technologies
Colloquia -Student Day
The "LCT Student Day" is a getting-together-event to be updated on
the research projects carried out by young researchers working or
previously studying in Bolzano, Trento or Rovereto.
As part of this event there will also be the "KRDB & IBM Best Thesis
Awards" Ceremony. Join us to congratulate with the winner!!
Where and When: Bolzano, Free University of
Bozen-Bolzano, Via Sernesi 1, 15th of May 2008,
- Oral presentations and "KRDB & IBM Best Thesis Awards" Ceremony: Room D003 (15:30-16:30)
- Coffee Break: UniBar (16:30-17:00)
- Poster presentations: In front of Room D102 (17:00-18:00)
The winners of the Awards are: Barbara Plank and Evgeny Kharlamov. Congratulations!!! Here you can find the UniNews in German and here its translation into English made by Barbara.
Barbara
Plank. European Masters Program in LCT, Free University of
Bozen-Bolzano (first year), Universiteit van Amsterdam (second
year). Sub-domain driven parsing. Thesis main supervisor: Khalil
Sima'an (Universiteit van Amsterdam).
Evgeny
Kharlamov. European Masters Program in CL, Dresden University of Technology
(first year), Free University of Bozen-Bolzano (second year). Thesis supervisors: Diego Calvanese, Werner Nutt.
| Speaker |
Affiliation |
Title |
Abstract |
Posters |
| Andrea Abel and Stefanie Anstei |
EURAC |
Approaches to Computational Lexicography for German Varieties |
Abstract |
Poster |
| Luciana Benotti |
LORIA/INRIA |
Tacit sensing in a non-traditional conversational system |
Abstract |
Poster |
| Raffaella Bernardi, Paolo Dongilli and Daniele Gobbetti |
KRDB-FUB |
MIVaS |
Abstract |
|
| Raffaella Bernardi, Paolo Buoso and Daniele Gobbetti |
KRDB and Library (FUB) |
CACAO |
Abstract |
|
| Katja Ignatova |
Technische Universiteit Darmstadt, Ubiquitous Knowledge
Processing Lab |
Question Answering for E-Learning |
Abstract |
Poster |
| Gerhard Kremer |
CIMeC |
Cognitively Salient Semantic Relations in and for Concept Description |
Abstract |
Poster |
| Manuel Kirschner and Raffaella Bernardi |
KRDB-FUB |
A Task/Entity-Based Context Model for Answering Follow-up Questions |
Abstract |
|
| Barbara Plank |
Alfa informatica, Faculty of Arts, University of Groningen |
Domain Adaptation of Syntactic Disambiguation Models |
Abstract |
Poster |
| Marija Slavkovik |
Computer Science and Communications
University of Luxembourg |
Logic Reasoning in Question Answering |
Abstract |
Poster |
| Sara Tonelli |
Ca' Foscari University/FBK |
Building Italian FrameNet through frame information transfer from English to Italian |
Abstract |
|
| Camilo Thorne |
KRDB-FUB |
Expressing Formal Queries over DL-Lite Ontologies with Controlled English |
Abstract |
|
Approaches to Computational Lexicography for German Varieties
Andrea Abel and Stefanie Anstei, EURAC ^

|
|
Corpora built for linguistic varieties of a pluricentric language such
as German are an indispensable resource for a detailed and systematic
variety comparison and dictionary development. We present desiderata
and suggestions as well as methods from computational linguistics to
systematically apply variety corpora for the enrichment,
i.e. confirmation, extension and generation, of lexical entries in
distinctive variant dictionaries for German. Examples are those
variant dictionaries developed by Ammon et al. (2004) and Abfalterer
(2007), where we focus on the South Tyrolean German language. On the
one hand, we conducted a systematic frequency analysis in newspaper
variety corpora for approved lists of South Tyrolean special
vocabulary in order to possibly refine corresponding dictionary
entries with corpus evidence. On the other hand, we filtered the list
of words of our South Tyrolean corpus which could not be lemmatised by
a tool developed for the variety in Germany. After removing special
vocabulary collected for the South Tyrolean variety in other projects
(e.g. legal terms), the remaining list was manually checked for
possible new variant dictionary entries, thus - as an innovative
variety corpus lexicographic approach - also automatically filtering a
huge amount of data to extract only relevant data to be investigated
in detail. In addition, we semi-automatically extracted lexical
cooccurrences of our two newspaper corpora and compared their
frequencies – with the assumption that those cooccurrences are worth
being more closely investigated that have high frequency in the South
Tyrolean corpus and very low frequency in the corpus from
Germany. With these three methods we were not only able to refine
dictionary entries for South Tyrolean German, but also to add new
ones. The findings on variants can be re-used for further corpus
annotation resulting in again better resources for computational
variant lexicography of the kind described, which is also to be
extended to more complex linguistic levels.)
|
Tacit sensing in a non-traditional conversational system
Luciana Benotti, INRIA/LORIA ^
|
When interlocutors are engaged in situated dialogue their
informational states evolve not only through dialogue acts but also
through physical and sensing acts. All these acts can be performed
either explicitly or tacitly during the interaction. In most
dialogues, even when acts are performed tacitly, their execution can
be inferred from the subsequent acts.
I am studying the interaction among dialogue, physical and sensing
acts in the framework of a non-traditional conversational system using
a non-traditional automated planner. The planner I am using is able to
find plans in the presence of incomplete knowledge and sensing, the
most common setup in situated conversation.
|
CACAO
Raffaella Bernardi, Paolo Buoso and Daniele Gobbetti, KRDB and Library (FUB) ^

|

|
CACAO is a 24th month targeted project supported by the eContenPlus
Programme of the European Commissions, started on the 1st of December
2007.
The aim of CACAO is to provide an infrastructure to the end-user that
enables him/her to type queries in his/her own language and retrieve
documents and objects in any available language. The sound integration
of CACAO's infrastructure designed for multilingual purposes with
current digital library and catalogue systems will be reached by
coupling natural language processing techniques with existing
information retrieval systems. CACAO will deliver tools for the
maintenance of multilingual resources in the retrieval context.
|
MIVaS
Raffaella Bernardi, Paolo Dongilli, Daniele Gobbetti, KRDB-FUB ^

|

|
MIVaS is a project carried out by the “Comitato provinciale di
Valutazione per la Qualita' del Sistema scolastico” and the KRDB
research group (Faculty of Computer Science) in collaboration with
DiPED (University of Roma III). It aims at answering the following u
questions: 1. How did the Educational System in Alto Adige in the
Italian Schools has changed from the 1966 till now? 2. Are there
changes in the linguistic competences of the students from 1966 till
now? 3. Which are the linguistic competences of the students of class
A vs. class C? Did they changes within the years? 4. How are the
answers to the above questions connected?
To this end, essays from Secondary Schools in Alto Adige have been
collected, digitialized and annotated. We will report on the
statistical analysis about lexical variations within the years.
|
Question Answering for E-Learning
Katja Ignatova, Technische Universiteit Darmstadt,
Ubiquitous Knowledge Processing Lab ^

|

|
Information overload is a well-known problem which also affects
learning, since huge amounts of learning material are nowadays
available in different formats and from different sources. This makes
it all the harder for the learner to access information in a fast and
direct way. In the QA-EL project we investigate new applications of
dynamic lexical-semantic resources for information search in
eLearning. Our goal is to provide uniform access to both institutional
and informal knowledge resources, whereby precise and short aggregated
answers are supplied to the learner. Our system architecture focuses
on the integration of information extracted from different knowledge
repositories for the targeted needs of Question Answering in eLearning
2.0. Classical linguistically motivated resources such as GermaNet are
coupled with lexical-semantic information extracted from collaborative
resources like Wikipedia, and put into service for processing
heterogeneous institutional and other Web 2.0 eLearning content.
|
Cognitively Salient Semantic Relations in and for Concept Description
Gerhard Kremer, CIMeC ^

|
Presenting words semantically related to a lexical entry in an
electronic dictionary is particularly useful in a language learning
environment. However, there is lack of evidence of which semantic
relation types are cognitively salient in order to use them for
(semi-) automatic extraction of word field candidates from corpora.
As a first step within this project, semantic relation types are
collected from a concept description experiment with German and
Italian participants.
|
A Task/Entity-Based Context Model for Answering Follow-up Questions
Manuel Kirschner and Raffaella Bernardi, KRDB-FUB ^

|
In Interactive Question Answering (IQA), users frequently pose
Follow-Up questions. In fact, giving the user the ability to ask
follow-ups is often considered the key advantage over classical QA. An
IQA system needs a suitable representation of the dialogue context to
help resolve context-dependent Follow-Up questions. Computational
linguists have tried to identify salient relations holding between the
previous IQA dialogue and the Follow-Up. To this end, we have been
experimenting with a Task/Entity-based model of the dialogue
context. In this talk, I will give an overview of the current
implementation of our IQA application, focusing on how follow-up
questions are processed. I will describe our current experiments with
learning parameters of our question answering algorithm from actual
dialogue data that we have collected.
|
Domain Adaptation of Syntactic Disambiguation Models
Barbara Plank, Alfa informatica, Faculty of Arts, University of Groningen ^

|

|
A major challenge in Natural Language Processing (NLP) is the inherent
ambiguity of Natural Language. In parsing, a specific area of NLP, the
ambiguity is characterized by multiple alternative syntactic analyses
for a given input sentence. A parser has to cope with this difficulty,
and has to choose among the various alternatives. Usually, the
framework of probability theory and statistics is employed as modeling
tool, leading to statistical parsing.
Modern statistical parsers are trained on large annotated corpora
(treebanks) and their parameters are estimated to reflect properties
of the training data. Thus, a disambiguation component bases its
decisions on the treebank and it will be successful as long as the
treebank it was trained on is representative for the input the model
gets. Hence, as might be expected, performance degrades when a model
trained on one domain is applied to another domain. For example, a
parser trained on newspaper text can be expected to have reasonable
performance if it is applied to newspaper sentences. However, the
model will be less adequate to analyze, say, spoken data. Hence, a
well known problem arising from the domain dependence of a parsing
systems is their portability.
A simple solution to improve performance on a new domain is to
construct a parser specifically for that domain. However, this amounts
to hand-labeling a reasonable amount of training data which is clearly
very expensive and leads to an unsatisfactory solution.
In alternative, domain adaptation techniques try to leverage a small
amount of already existing annotated data or to use unlabeled data
from one domain to parse data from a different domain.
The current project explores techniques for domain adaptation for
Alpino, a wide-coverage analyzer for Dutch. In an initial experiment,
a supervised approach to domain-adaptation has been examined, where a
small amount of available in-domain data was assumed. In future
research we will study techniques to gauge to what extent we can
leverage either existing labeled data or also unlabeled data. This
includes a more thorough research on the precise extent of the notion
of domains. It comprises the study of domain characteristics that
might be exploited for the task of parsing.
|
Logic Reasoning in Question Answering
Marija Slavkovik, Computer Science and Communications, University of Luxembourg ^

|

|
The improvement of current Question Answering (QA) systems can lie in
finding ways to support the traditional statistic approach to QA with
logic reasoning. In my thesis I suggest one way of supporting an
Interactive Question Answering system with logic reasoning.
As a case study we make an overview of BoB, a chatter-bot which
interactively answers questions over the library domain. I suggest an
architecture of a Logic Support Unit (LSU) for BoB. The LSU will
support BoB's work by performing verification or refutation of the
retrieved answers and extraction of the specific answer(s) from the
verified answers.
This problem is represented in terms of Answer Set Programming. The
logic programs (representing the question and the tested answer) are
built from the first-order representations of the questions and
answers. The thesis contains a description on how to analyze the
answer sets of these programs to verify or refute an answer and to
extract a specific answer.
For the purpose of building semantic representations for natural
language, I use Boxer which builds first-order logic formulas from
parsed natural language sentences. To allow for efficient reasoning,
natural language fragments are defined whose sentences have decidable
first-order representations.
As a general conclusion, we find that the ASP framework has many features
which can be used for the task of supporting IQA with deep analysis. The most
promising of these features we present as Future Work of the thesis.
|
Building Italian FrameNet through frame information transfer from English to Italian
Sara Tonelli, Ca' Foscari University/FBK ^

|

|
The creation of English FrameNet started in Berkeley 10 years ago and
aimed at developing an on-line lexical resource for English, based on
frame semantics and supported by corpus evidence. The project is still
ongoing and tries to document the range of semantic and syntactic
combinatory possibilities (valences) of each word in each of its
senses, through computer-assisted annotation of example sentences and
automatic display of the annotation results. In recent years, other
research projects have been seeking to produce comparable
frame-semantic lexicons for other languages and to devise means of
automatically labeling running text with semantic frame
information. Since manual annotation is expensive and time-consuming,
I am developing Italian FrameNet using automatic labelling tecniques
as much as possible. In particular, I implemented a projection
algorithm for transferring frame-semantic information from English to
Italian texts and tested it on a portion of the Europarl corpus. In my
first experiments, I could point out typical features of the Italian
language as regards frame-semantic annotation, in particular I had to
deal with the peculiarities of Italian that at the moment make the
projection task more difficult than for German or Swedish. In general,
the approach seems to be very promising and I plan to exploit the
advantages of frame information transfer with other parallel corpora
and with automatically translated texts.
|
Expressing Formal Queries over DL-Lite Ontologies with Controlled English
Camilo Thorne, KRDB-FUB ^

|
We propose to characterize the computational complexity of answer- ing
questions in ontology-mediated controlled language interfaces to
structured data sources by expressing ontology-based data access in
controlled English. This means: compositionally mapping a controlled
subset of English to knowledge bases and for- mal queries for which
the computational complexity is well-known. In the present paper, we
extend this approach to conjunctive queries and to conjunctive queries
with aggregate functions.
|