European Masters Program in Language and Communication Technologies


Language and Communication Technologies Colloquia

This year the talks will focus on corpora, dialogue and discourse, natural language generation, parsing, and natural language interfaces to databases

The Colloquia will take place every other Thursday during the months from January to May 2006 according to the calendar below. All seminars are held in English, unless noted differently.

A printable PDF Poster with the complete schedule of the LCT Colloquia is available for download.

If you are not in the mailing list of the LCT Colloquia and you want to receive a reminder before every seminar, we invite you to subscribe at the following link: https://www.inf.unibz.it/mailman/listinfo/lct-colloquia.

We are also organizing reading groups on the above-mentioned topics. They will take place some time before each Colloquium. A mailing list has also been setup: https://www.inf.unibz.it/mailman/listinfo/lct-reading-group.

For more information, please contact Raffaella Bernardi or Paolo Dongilli.


January   February   March   April   May   June  

DateSpeakerAffiliationTitleAbstractSlides
January 19th  
16:00-17:00
Marco Baroni SITLEC Department, University of Bologna, Italy Building Large Corpora from the Web Abstract baroni.pdf
February 2nd
16:00-17:00
Verena Lyding and Isabella Ties EURAC.Research, Bozen, ItalyCreation of Parallel Corpora for Special Purposes Abstract lyding-ties.pdf
February 16th
16:00-17:00
Rodolfo Delmonte University Ca' Foscari, Venice, ItalyVIT - Venice Italian Treebank Abstract delmonte.pdf
March 2nd
16:00-17:00
Daniela Veronesi and Alessandro Vietti Centre for Language Studies, Free University of Bozen-Bolzano, Italy Qualitative and Quantitative Approaches in Linguistics: Some Examples from Research on Multilingualism Abstract veronesi.pdf
vietti.pdf
March 9th
16:00-17:00
Giorgio Satta Department of Information Engineering, University of Padua, Italy Introduction to Lexicalized Context-Free Grammars and Lexicalized Tree Adjoining Grammars Abstract  satta.pdf
March 16th
16:00-17:00
Michael Minock Department of Computing Science, University of Umeå, Umeå, Sweden Revisiting Natural Language Interfaces to Databases Abstract  minock.pdf
March 30th
16:00-17:00
Giuseppe Riccardi Department of Information and Communication Technology, Faculty of Engineering, University of Trento, Italy Learning Language Structure: a Statistical Perspective Abstract  
April 13th
16:00-17:00
Massimo Poesio University of Trento, Italy / University of Essex, UK Completions and Coordination in Dialogue Abstract  
May 11th
16:00-17:00
Johan Bos Linguistic Computing Laboratory, Department of Computer Science, University of Rome "La Sapienza", Italy Recognising Textual Entailment with Logical Inference Abstract  
May 18th
16:00-17:00
Bonnie Webber Division of Informatics, University of Edinburgh, UK Discourse Grammar from a Lexical Perspective Abstract  webber.pdf
May 25th
16:00-17:00
John Bateman English Department, University of Bremen, Germany Ontology and Natural Language Processing: Modularity and Design in Spatial Communication Abstract  bateman.pdf
June 22nd
12:30-13:30
Carlo Meghini Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy Digital Libraries: Moving from Library Automation to Digital Curation Abstract  
June 26th
11:30-12:30
Katja Ignatova Cognitive Systems @ DFKI Language Technology Lab, Saarland University, Germany Understanding an environment through the eyes of a robot Abstract  



January ^


January 19th, 2006, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Building Large Corpora from the Web
Marco Baroni, SITLEC Department, University of Bologna, Italy


In the last 20 years corpora, i.e., collection of language samples produced in natural contexts and without experimental interference, have played an increasingly central role in linguistics and related disciplines, such as lexicography and natural language processing. Given that the World Wide Web is, among other things, a huge database of textual documents, it has become increasingly common, for linguists, to turn to the Web as a source of language corpora.

In this talk, I will discuss several ways to access the linguistic data available online, arguing that the only viable long-term approach is to construct corpora by conducting large crawls of the Web, and cleaning, annotating and indexing the resulting document collections. I will describe the procedure necessary to build and manage large Web corpora (one billion words and more), illustrating common methodologies and discussing outstanding problems. I will conclude by presenting some concrete examples of linguistic analysis conducted on Web corpora of Italian, German and Japanese.




February ^


February 2nd, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Creation of Parallel Corpora for Special Purposes
Verena Lyding and Isabella Ties, EURAC.Research, Bozen, Italy


Parallel corpora are collections of text documents aligned with their translations. Parallel corpora of bi- or multilingual documents are widely employed in all areas of language studies, including linguistic analyses, applications in translation studies and machine translation approaches.

In this seminar we will talk about the creation and representation of the trilingual corpus CLE (German - Italian - Ladin), designed for the usage by translators and terminologists. Describing the different steps involved in the creation of a parallel corpus we will discuss decisions taken and problems we encountered. roles played in the business chain, will be discussed.


February 16th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

VIT - Venice Italian Treebank
Rodolfo Delmonte, University Ca' Foscari, Venice, Italy


In this talk we will present the Italian Treebank called VIT. The treebank has been finally issued last July and contains syntactic structural representations for 10,000 sentences which correspond approximately to 270,000 words of written Italian, plus 50,000 words of spoken Italian corresponding to 4,400 dialogue moves - of four regional varieties, Neapolitan, Barese, Roman, and Pisano Italian. The number of constituent labels is over 300,000 which have all been manually validated.

Besides syntactic constituency, the treebank also contains functional dependency structures for some 80,000 words, corresponding to approximately 3800 sentences.

We will describe the main linguistic features of the treebank, as well as its peculiarities w.r.t. structural features of the Penn Treebank. We will also show its accompanying browsing facilities.




March ^


March 2nd, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Qualitative and Quantitative Approaches in Linguistics: Some Examples from Research on Multilingualism
Daniela Veronesi and Massimo Vietti, Centre of Language Studies, Free University of Bozen-Bolzano, Italy


Within linguistics, specifically sociolinguistics and conversation analysis, research on multilingualism has experienced a rapid growth over the past decades, thus investigating language contact and code-switching phenomena, second language acquisition and written and oral communication not only in the context of traditional bilingual communities and language minorities (especially in Europe), but also from the perspective of new immigrants, of intercultural interaction at the workplace and the like. Language structures and discursive practices have been thus approached both from a qualitative and from a quantitative perspective, with methodologies ranging from in-depth-interviews and participant observation to multivariate analysis and implicational scales.

As an example of a qualitative approach in linguistics, in the first part of the talk Daniela Veronesi will elaborate on "language biographies", a topic which has emerged in the last decade as a new field of research and which has proved very fruitful when investigating the relationship speakers establish with languages during their lives. Working on language biographies - collected through narrative interviews and examined by means of discourse and conversation analysis - allows to explore issues such beliefs, attitudes, motivation, experiences of language learning and intercultural communication and thus not only to integrate quantitative data on the same topics, but also to provide inputs to language planning and language didactics.

In the second part of the talk, Alessandro Vietti will provide a short overview of quantitative methods in sociolinguistics, focusing on a specific statistical technique (logistic regression) which has proved to be very fruitful in investigating language variation. Quantitative methods have been developed to account for grammatical variants, the occurrence of which is non-categorical, but rather probabilistic. To illustrate this issue, two case studies, based on empirical data, will be introduced and discussed: the first concerns a contact-induced language variation in Italian as spoken by Spanish-speaking Peruvian immigrants; the second deals with code-choice (German-Italian) in South Tyrol in anonymous street interactions.


March 9th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Introduction to Lexicalized Context-Free Grammars and Lexicalized Tree Adjoining Grammars
Giorgio Satta, Department of Information Engineering, University of Padua, Italy


In recent years, much of the parsing literature has focused on so-called lexicalized grammars, that is grammars in which each individual rule is specialized for one or more lexical items. Formalisms of this sort include dependency grammar, lexicalized context-free grammars and lexicalized tree-adjoining grammars. We will briefly overview these models, and discuss their use in parsing of natural language.


March 16th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Revisiting Natural Language Interfaces to Databases
Michael Minock, Department of Computing Science, University of Umeå, Umeå, Sweden


Although interest in natural language interfaces to databases has waned in recent years, a new emphasis on semantics and closed domain question answering presages a revival. This talk address this long standing problem. We shall start with a critical review of prior work, mainly carried out in the 1980's. Next we shall identify recent advances in database and language technology that might enable better solutions. Finally we shall review several state of the art systems to characterize current approaches.


March 30th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Learning Language Structure: a Statistical Perspective
Giuseppe Riccardi, Department of Information and Communication Technology, Faculty of Engineering, University of Trento, Italy


Spontaneous speech poses many challenges to the speech recognition and understanding problem. In this talk we will address the language modeling issues for speech recognition and understanding for large vocabulary spoken dialog systems. We will show how stochastic modeling provide rich tools to acquire different language structures: from n-grams, to word phrases, phrase grammars and head-dependency structures. Phrase-based models are powerful in that they enhance the traditional n-gram model and allow for a tight integration with the understanding features ("Recognition for Understanding"). We will also review algorithms to automatically learn head-dependency grammars and speech disfluency-based language models. We show that people's responses to computer prompts vary over time and (dialog) state and we propose a framework to track time-varying statistical parameters of a spoken dialog system. In the final part of the talk we will discuss the research challenges of the statistical approach to model language structure.




April ^


April 13th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Completions and Coordination in Dialogue
Massimo Poesio, University of Trento, Italy / University of Essex, UK


COMPLETIONS and CONTINUATIONS are two fundamental strategies for coordination in dialogue. An example of completion is 1.2 in the following example from the Bielefeld Toy Airplane Corpus (Skuplik 1999, Rieser and Skuplik 2000), in which Cnst completes the instruction began by Inst.

1.1 Inst So, jetzt nimmst Du
    (OK, now you take)
1.2 Cnst eine Schraube
    (a screw)
1.2 Inst eine orangene mit einem Schlitz
    (an orange one, with a slit)

After briefly reviewing the characteristics of completions and continuations, I will briefly introduce the PTT framework for the analysis of dialogues (Poesio and Traum 1997, 1998; Matheson et al. 2000). I'll then discuss an `intentional' account of how completions and continuations may be produced, building upon work by Clark (1996), Bratman (1993), Tuomela (2000), and Grosz and Kraus (1996). Finally, I will consider a non-intentional explanation of completions, taking up Pickering and Garrod's suggestions concerning dialogue description (Pickering and Garrod 2003).




May ^



May 11th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Recognising Textual Entailment with Logical Inference
Johan Bos, Linguistic Computing Laboratory, Department of Computer Science, University of Rome "La Sapienza", Italy


In this talk I present a system that automatically builds semantic representations from English texts, and moreover, is capable of reasoning with the result. The system achieves high coverage (ca. 95%).

From a theoretical point of view, the system is supported by two linguistic frameworks: Combinatorial Categorial Grammar (CCG), to determine the syntactic structure of sentences, and Discourse Representation Theory (DRT), to specify the semantics.

After introducing the system I will discuss its application in recognising textual entailment (RTE), showing how one can use first-order inference tools such as theorem provers and model builders to reason with semantic representations. I will also show the difficulties involved in making such a system successful.


May 18th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Discourse Grammar from a Lexical Perspective
Bonnie Webber, Division of Informatics, University of Edinburgh, UK


To date, Language Technology has derived its greatest success from words and word-level techniques. Since discourse is so much more than words, will it prove to be beyond the promises of this technology? This talk suggests that the answer is "no", arguing that the lexicon provides a robust basis for low-level discourse grammar.

I start by reviewing some previous proposals regarding discourse structure and discourse grammar, and then describe a lexicalised discourse grammar modelled on Lexicalised Tree-Adjoining Grammar. What is attractive about this approach from a linguistic perspective, is the range of examples it is able to explain.

On the other hand, interesting examples are not necessarily common examples. So to provide empirical grounding for such work on discourse, I am working with colleagues at the University of Pennsylvania on what we call the "Penn Discourse TreeBank" (http://www.ircs.upenn.edu/~pdtb/). I will conclude the talk by describing features of this resource and its current state.


May 25th, 16:00-17:00 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Ontology and Natural Language Processing: Modularity and Design in Spatial Communication
John Bateman, English Department, University of Bremen, Germany


In this talk I focus on the role of linguistically motivated ontologies for constructing modular architectures for natural language processing. The talk considers results from ongoing projects in natural language dialogue and natural language generation for a range of spatially-embedded tasks. I describe the generation architectures used, their embedding into a dialogue system, and the use made of linguistic and non-linguistic ontologies for managing and mediating between system components. Particular attention will be placed on the requirements and design criteria for the knowledge sources involved, and the high degree of flexibility demanded of the relationships between ontology and language when dealing with naturally occuring communicative strategies.




June ^



June 22nd, 12:30-13:30 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Digital Libraries: Moving from Library Automation to Digital Curation
Carlo Meghini, Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy


Digital libraries are at the center of a large research and development effort, started with the first U.S. initiative back in 1994. The recent Google project on digitazing one million books has created an enormous attention around this area, especially in Europe. The talk will briefly review the basic notions of Digital Libraries, exemplified through an overview of the BRICKS Project, a main Integrated Project within the Cultural Heritage sector of the 6FP. Then, the main theme of digital curation will be introduced, and exemplified through a description of the work programme of the CASPAR IP project, which just started with the goal of developing a methodology and associated tools for the preservation of digital objects.



June 26th, 11:30-12:30 - Faculty of Computer Science, FUB, Seminar Room (first floor left)

Understanding an environment through the eyes of a robot
Katja Ignatova, Cognitive Systems @ DFKI Language Technology Lab, Saarland University, Germany


Despite the progress achieved in different fields of AI and Cognitive Science the way to creating a robot with human-like performance remains long. Part of the reason for this is fragmented research, for example, concentrating on vision, language processing or mobile robotics. For a robot capable of performing a diverse collection of tasks, including various combinations of visual and other forms of perception, learning, reasoning, communication and goal formation, several disciplines have to be considered together. Producing body of theory and implementations addressing this problem is the main goal of CoSy (Cognitive Systems for Cognitive Assistants) project. The talk will be mostly based on ideas and achievements gained in the context of this project.

Understanding the environment has two sides. On the one hand the robot needs communication with other agents for experiencing the environment and being able to learn continuosly. On the other hand the robot has to relate its understanding to what is being talked about - for example, objects or spatial organization of the environment. In this respect several issues concerning spatial exploration, situated dialogue in spatial exploration and scene recognition will be addressed.