European Masters Program in Language and Communication Technologies
Internships
Besides internships at the KRDB, students can apply for internships at FBK. Further information can be found here.Projects and Theses in LCT at FUB
Here you find a list of the projects and theses that can be carried out in the LCT field at FUB. Notice that some of them are meant for Bachelor students. However, if a Master student is interested in any of those proposed topics, the project could be extended.
Keywords: Corpora; PoS Tagging; NLP; FST; FG; QA; Chatbot.
A glossary with explanations of these keywords is provided below.
Supervisors
Students will be supervised either by FUB staff or by staff from ITC-irst, CELI or CELECT. Students of the European Masters Program in Language Communication Technologies or the European Masters Program in Computational Logic can also be co-supervised by staff from their second university.
FUB Supervisors
- Alessandro Artale: Databases
- Raffaella Bernardi: Computational Linguistics
- Diego Calvanese: Theory of Computing
- Enrico Franconi: Knowledge Representation
- Rosella Gennari:Constraints Programming
- Sergio Tessaris: Artificial Intelligence
External Co-supervisors
- CELCT Staff: Evaluation of Language and Communication Technologies
- FBK-irst Staff: Question Answering, Speech Recognition, Intelligent Interfaces
- CELI Staff: Information Extraction, Text Mining, Data Mining, NLP, Multilingual Frameworks
- Staff at the partner universities of the EM in LCT (for students of the European Master).
| Title | Details | Type |
|---|---|---|
| Transformation Based Error-Driven Part-of-Speech Tagging with an English Corpus | Description (Assigned) | BSc intership |
| Transformation Based Error-Driven Part-of-Speech Tagging with a Multilingual Corpus of Questions | Description (Completed by Anna Mari, AA: '04/'05) | BSc final thesis |
| Reduction-Based Error-Driven Part-of-Speech Tagging applied to a Swedish Corpus | Description (Assigned) | BSc internship |
| Enhancement of a Reduction-Based Part of Speech Tagger applied to English and Swedish corpora | Description (Assigned) | BSc final thesis |
| A Morphotactic for English based on Finte State Trasducers | Description | MSc internship |
| Application of the Morphological Analyzer based on Finte State Trasducers | Description | MSc final thesis |
| Transformation-Based Error-Driven Parsing with an English Corpus | Description | MSc internship |
| Transformation-Based Error-Driven Parsing for Prepositional Phrase Attachment Disambiguation | Description | MSc final thesis |
| A Morphotactic for English based on Finte State Trasducers | Description | MSc internship |
| Application of the Morphological Analyzer based on Finte State Trasducers | Description | MSc final thesis |
| Evaluation of existing Chatbots | Description | MSc internship |
| Implementation of a Chatbot using domain specific FAQs | Description | MSc final thesis |
| The TREC Track on Question Answering | Description | MSc internship |
| Application of Annotated Corpora to Question Answering | Description | MSc final thesis |
| Temporal annotation tools and constraint programming | Description | European MSc in Comp. Logic internship |
| Temporal annotation with uncertainty | Description | European MSc in Comp. Logic final thesis | The 'Saarbruecken Text Adventure': codifing actions for a planner fully based on LTL | Description (Assigned) | European MSc in Comp. Logic internship |
| Action Planning in a Dialog System | Description (Assigned) | European MSc in Comp. Logic final thesis |
| Projects at IRST | To be announced | MSc internship |
| Projects at IRST | To be announced | MSc final thesis |
| Projects at CELI | To be announced | MSc internship |
| Projects at CELI | To be announced | MSc final thesis |
Descriptions
- Supervisors:
- Raffaella Bernardi and Paolo Dongilli.
- TITLE:
- Transformation Based Error-Driven Part-of-Speech Tagging withan English Corpus
- DESCRIPTION:
- During this internship the student will implement the base version of Eric Brill's algorithm for POS Tagging testing its performance over an English corpus.
- LEVEL:
- BSc intership
- REFERENCE:
- Eric Brill. Transformation-Based Error-Driven Learning and NLP: A case study in POS Tagging. 1995.
- Supervisors:
- Raffaella Bernardi and Paolo Dongilli
- TITLE:
- Transformation Based Error-Driven Part-of-Speech Tagging with a Multilingual Corpus of Questions
- DESCRIPTION:
- The candidate will extend the base version of Eric Brill's POS tagger adding the module for tagging unknown words and implementing a k-best tagger. The resulting tagger will be trained then on an annotated multilingual parallel corpus of questions (English, German, Italian). At last its performance on the different languages will be tested and compared.
- LEVEL:
- BSc final thesis
- REFERENCE:
- Eric Brill. Transformation-Based Error-Driven Learning and NLP: A case study in POS Tagging. 1995.
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Reduction-Based Error-Driven Part-of-Speech Tagging applied to a Swedish Corpus
- DESCRIPTION:
- the aim of this internship is to learn the idea behind the reduction-based error-driven part-of-speech tagging algorithm which will be then trained and tested on a corpus of Swedish texts, the Stockholm Umea Corpus.
- LEVEL:
- BSc intership
- REFERENCE:
- Brill 95, Megyesi, B. 2001. Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001). pp. 151-158, Carnegie Mellon University, Pittsburgh, PA, USA, June 3 and 4 2001.
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Enhancement of a Reduction-Based Part of Speech Tagger applied to English and Swedish corpora
- DESCRIPTION:
- this thesis is the natural extension of the internship described above. The student is asked to extend the implemantation of the base algorithm for reduction-based error-driven POS tagging adding a module for tagging unknown words in an open world assumption. After a training and test phase on both an English and a Swedish corpus, the results are compared with the ones obtained on the same corpora by other state-of-the-art taggers.
- LEVEL:
- BSc final thesis
- REFERENCE:
- Brill 95 Megyesi, B. 2001. Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2001). pp. 151-158, Carnegie Mellon University, Pittsburgh, PA, USA, June 3 and 4 2001.
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- A Morphotactic for English based on Finte State Trasducers
- DESCRIPTION:
- during the internship the student will study the FST developed at XEROX (Palo Alto) and familiarize with their tools by taking some word class of English as a case study.
- LEVEL:
- MSc intership
- REFERENCE:
- Lauri Karttunen. Applications of Finite-State Transducers in NLP
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Morphological Analyzers based on Finite State Trasducers
- DESCRIPTION:
- The student will integrate the Morphotactic component developed during the intership above with a spell checker and extend the fragment of the lexicon studied.
- LEVEL:
- MSc final thesis
- REFERENCE:
- Lauri Karttunen. Applications of Finite-State Transducers in NLP
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Transformation-Based Error-Driven Parsing with an English Corpus
- DESCRIPTION:
- during the internship the student will implement a natural language parser based on Eric Brill's transformation-based error-driven language learning approach applying it to an English corpus
- LEVEL:
- MSc intership
- REFERENCE:
- Eric Brill. Transformation-Based Error-Driven Learning and NLP: A case study in POS Tagging. 1995.
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Transformation-Based Error-Driven Parsing for Prepositional Phrase Attachment Disambiguation
- DESCRIPTION:
- in this thesis the candidate will reuse the code of theabovementioned MSc internship experimenting different approaches of prepositional phrase disambiguation on an English Corpus.
- LEVEL:
- MSc final thesis
- REFERENCE:
- Eric Brill. Transformation-Based Error-Driven Learning and NLP: A case study in POS Tagging. 1995.
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Evaluation of existing Chatbots
- DESCRIPTION:
- The students will compare existing Chatbots like ELIZA and ALICE and evaluate their performance
- LEVEL:
- European MSc Computational Logic, 3rd year project
- REFERENCE:
- ALICE
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Implementation of a Chatbot using domain specific FAQs
- DESCRIPTION:
- The student will implement one of the Chatbot evaluated during the internship above. This Chatbot will be used to deal answer questions of a given FAQ. The result of thesis will be a FAQchat
- LEVEL:
- European MSc Computational Logic, final thesis
- REFERENCE:
- ALICE
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- The TREC Trak on Question Answering
- DESCRIPTION:
- During the internship the student will familiarize with the QA research fild, studying the most efficient QA system partecipating at TREC
- LEVEL:
- European MSc Computational Logic, 3rd year project
- REFERENCE:
- TREC
- Supervisors:
- Paolo Dongilli or Raffaella Bernardi.
- TITLE:
- Application of Annotated Corpora to Question Answering
- DESCRIPTION:
- The student will study how to improve the choosen QA system by means of linguistic knowledge. More specifically, he/she will use PoS Annotated Questions or Parsed Questions to retrieve answers
- LEVEL:
- European MSc Computational Logic, final thesis
- REFERENCE:
- De Boni
- Supervisors:
- Raffaella Bernardi and Rosella Gennari
- TITLE:
- Temporal annotation tools and temporal constraints
- DESCRIPTION:
- Temporal Annotation Languages, most notably TimeML, are used to mark up temporal expressions or events in free texts. Current annotation tools have limited inference capabilities. On the contrary, the literature of temporal constraint satisfaction problems (TCSPs) abunds with algorithms for temporal reasoning. In this part of the project the student will get acquainted with temporal annotation languages and the TCSP literature. The student should then deliver a proposal on what TCSP frameworks and algorithms could be most useful in this context.
- LEVEL:
- EMSc intership
- REFERENCE:
- Supervisors:
- Raffaella Bernardi and Rosella Gennari
- TITLE:
- Temporal annotation tools and temporal constraints
- DESCRIPTION:
- The student should adapt and implement the TCSP frameworks and algorithms in a temporal annotation tool chosen in EMScLCT01a
- LEVEL:
- EMSc final thesis
- REFERENCE:
- Supervisors:
- Raffaella Bernardi and Rosella Gennari
- TITLE:
- Temporal annotation with uncertainty
- DESCRIPTION:
- Temporal Annotation Languages, most notably TimeML, are used to mark up temporal expressions or events in free texts. Current approaches do not allow annotators to express any degree of uncertainty, i.e.: the annotator is to mark up a temporal expression with either one annotation or none. During his/her internship, the student should study the relevant literature and design a framework which allows annotators to express their uncertainty and reason with this.
- LEVEL:
- EMSc intership
- REFERENCE:
- TimeML
- Supervisors:
- Raffaella Bernardi and Rosella Gennari
- TITLE:
- Temporal annotation with uncertainty
- DESCRIPTION:
- In the second part of this project, the student should work on the implementation of the framework described above.
- LEVEL:
- EMSc final thesis
- REFERENCE:
- Supervisors:
- Carlos Areces and Raffaella Bernardi
- TITLE:
- The 'Saarbruecken Text Adventure': codifing actions for a planner fully based on LTL
- DESCRIPTION:
- The project involves two applications previously implemented: The computer game 'FrOZ' and a DL reasoning system used to update, maintain, and query the knowledge bases The student will analyse how 'PaDoK' (Planning with Domain Knowlegde) can be used to find plans in the context of the game. In order to fulfill this goal, an utility will be implemented.
- LEVEL:
- MSc intership
- Supervisors:
- Carlos Areces and Raffaella Bernardi
- TITLE:
- Action Planning in a Dialog System
- DESCRIPTION:
- The student will integrate the planning step developed during the internship into the text adventure in order to enhance the flexibility of the execution of actions during the game.
- LEVEL:
- MSc intership
Glossary
-
Part-of-speech Tagging: (abbr. POS Tagging) the process of marking up the words in a text with their corresponding parts of speech. People commonly learn a simplified form of this in their early years of school, identifying nouns, verbs, and so on. However, the term is generally used to refer to computer algorithms to do much the same thing.
POS Tagging is a very practical application, with uses in many areas, including speech recognition and generation, machin transaltion, parsing, information retrieval and lexicography.
-
Corpus: (plural: corpora) a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora.
In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotating. An example for annotating a corpus is part-of-speech tagging.
Corpora are the main knowledge base in corpus linguistics.
-
Parsing: Natural Language Parsing is the syntactic analysis of natural languages, which objective is to determine the higher level components of sentences (verb phrases, noun phrases, prepositional phrases, etc.), and the relationships between them.
-
Prepositional Phrase Attachment: (abbr. PP attachment) a common cause of structural ambiguity in natural language. For example take the following sentence:
"Pierre Vinken joined the board as a nonexecutive director".
The PP 'as a nonexecutive director' can either attach to the NP (Noun Phrase) 'the board' or to the VP (Verb Phrase) 'joined', giving two alternative structures. (In this case the VP attachment is correct):
NP-attach: (joined ((the board) (as a nonexecutive director)))
VP-attach: ((joined (the board)) (as a nonexecutive director))
The attempt to resolve these ambiguities is called 'PP attachment disambiguation'.

