Home » Academic
Title: The Structure of Real User-System Dialogues in Interactive Question Answering
When real users engage in written conversations with an Interactive Question Answering (IQA) system, they typically do so in a sort of dialogue rather than by asking single shot questions. The context of a Follow-Up Question (FU Q), i.e., its preceding dialogue, can play an important role for the system to properly determine its meaning, and thus to find a correct answer. Exactly how much context the system requires, and specifically which information should be extracted from the context are open research questions in IQA. This thesis addresses both questions by studying the structure of real IQA dialogues, using logistic regression as a statistical modeling framework. Unlike much of the related work on FU Qs in IQA, this work is based on real user-system dialogues, collected via a chatbot-inspired help-desk IQA system that answers questions from the University library domain. We are releasing the collected dialogue log files of this system to the research community as the BoB (Bolzano library Bot) dialogue corpus.
The central part of this thesis is our empirical study of the structure of real IQA dialogues, based on our collected log data. The logistic regression modeling framework we adopt handily addresses both a theoretical and a practical motivation underlying our work, setting it apart from many other statistical modeling and machine learning frameworks:
Firstly, our theoretical motivation is that we want to describe properties of the structure of real IQA dialogues for which there is empirical evidence. We define structure in terms of the attributes and relations pertaining to the utterances (i.e., the questions and answers) inside a dialogue snippet. Dialogue snippets serve as a convenient abstraction over four consecutive utterances in an IQA dialogue: the FU Q, its corresponding system answer, and the immediately preceding user question and system answer pair from the dialogue context. In regression modeling, the attributes and relations we observe in a dialogue snippet become features that describe this snippet as a data instance for learning. Once the model is estimated on the training instances, we can simply ``read off’’ it which of the features play a significant role in describing empirically evident properties of dialogue structure.
Secondly, the practical motivation behind using regression modeling is that it is an instance of a supervised machine learning method. In our case, the learning task consists in choosing an answer candidate that most likely represents a correct answer to the FU Q, considering any information from the dialogue snippet that is deemed useful.We use the same estimated regression models as before, but now apply them in prediction mode on unseen dialogue snippets where the answer to the FU Q is yet to be determined. The practical goal is now to maximize the average answer correctness of a system that has to pick one answer from a fixed set of candidate answers. This makes our approach a viable candidate for implementation as an answer selection module in an actual IQA system.
The statistical modeling experiments integrate a wide array of linguistic measures, each of which describes dialogue structure using a specific attribute of a user question or system answer, or a specific relation holding between the two. We employ different versions of string similarity, as used for shallow question-answer matching in QA research, and features based on different linguistic theories of dialogue and discourse coherence. These theories speak of coherence between two utterances when there is a continuity of the things—objects or actions—that are in the focus of the particular pair of dialogue utterances. All our features have a common aim of describing inter-utterance patterns that are associated with coherent dialogue.
The first contribution of our work is the new empirical framework we propose to study the structure of real IQA dialogues: the creation and use of a corpus of real IQA dialogue data, and the rich set of utterance-related measures we encode as features and use as predictors in our statistical models. Our second contribution are the experimental results, demonstrating which of the proposed measures hold up against empirical evidence from real IQA dialogue data. Briefly, the best statistical models in this thesis combine information from certain shallow (lexical similarity) and deep (dialogue and discourse theories) measures. Also, the best models distinguish different FU Q types. This distinction has typically been based on a manually devised typology of FU Qs. Instead, this thesis explores a new, empirically motivated FU Q typology, and compares it to FU Q types proposed in previous literature. This data-driven typology not only provides new theoretical insights about IQA dialogue structure, but also improves a system’s performance in pinpointing the correct answer to a FU Q.
The implications of this work are two-fold. For the dialogue and discourse research community, concerned with theories of text coherence, it provides clues as to which automatically implementable theories of inter-utterance coherence hold up empirically in real IQA dialogues. On the other hand, the IQA research community could benefit from the results for learning how to automatically distinguish different types of FU Qs, and how to formulate answer pinpointing strategies for the particular FU Q types. More specifically, this work is a practical study of how a real IQA system can tackle the problem of context fusion, and as a result, improve the accuracy of finding the correct answer to FU Qs using dialogue context.
My thesis is available for download.
Please pay a visit to the BoB dialogue corpus web-site, providing more information on the dialogue data that was used as a basis for this research.
Manuel Kirschner and Raffaella Bernardi. Towards an Empirically Motivated Typology of Follow-Up Questions: The Role of Dialogue Context. In Proc. of SIGdial’10, Tokyo, Japan, 2010.
Manuel Kirschner. The BoB Dialogue Corpus. Technical Report KRDB10-4, KRDB Research Centre, Free University of Bozen-Bolzano, Bolzano, Italy, 2010.
Raffaella Bernardi, Manuel Kirschner and Zorana Ratkovic. Context Fusion: The Role of Discourse Structure and Centering Theory. In Proc. of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta. 2010.
Raffaella Bernardi and Manuel Kirschner. From artificial questions to real user interaction logs: Real challenges for Interactive Question Answering systems. In Proc. of Workshop on Web Logs and Question Answering (WLQA’10), Valletta, Malta. 2010.
Manuel Kirschner, Raffaella Bernardi, Marco Baroni and Le Thanh Dinh. Analyzing Interactive QA Dialogues using Logistic Regression Models. In Proc. of XIth International Conference of the Italian Association for Artificial Intelligence (AI*IA’09), Reggio Emilia, Italy. 2009.
Manuel Kirschner and Raffaella Bernardi. Exploring Topic Continuation Follow-up Questions using Machine Learning. In Proc. of NAACL HLT 2009: Student Research Workshop, Boulder, CO. 2009.
Raffaella Bernardi and Manuel Kirschner. Context Modeling for IQA: The Role of Tasks and Entities. In Proc. of Workshop for Knowledge and Reasoning for Answering Questions (KRAQ’08), Manchester, UK. 2008.
Manuel Kirschner and Raffaella Bernardi. Context Modeling for IQA: The Role of Tasks and Entities. Poster presented at LCT student session in Bolzano, 15. May 2008 (unpublished).
Manuel Kirschner and Raffaella Bernardi. An Empirical View on IQA Follow-up Questions. In Proc. of the 8th SIGdial Workshop on Discourse and Dialogue, Antwerp, Belgium. 2007.
Manuel Kirschner. Applying a Focus Tree Model of Dialogue Context
to Interactive Question Answering. In Proc. of the ESSLLI’07 Student Session, Dublin, Ireland. 2007.
Manuel Kirschner. The BoB IQA system: A Domain Expert’s Perspective. In Proc. of the 11th Workshop on the Semantics and Pragmatics of Dialogue (SemDial’07), Rovereto, Italy. 2007.
Manuel Kirschner. Building a Multi-lingual Interactive Question-Answering System for the Library Domain. In Proc. of the 10th Workshop on the Semantics and Pragmatics of Dialogue (Brandial’06), Potsdam, Germany. 2006.
Harald Hüning, Manuel Kirschner, Fritz Class, André Berton, and Udo Haiber. Embedding Grammars into Statistical Language Models. In Proc. of Interspeech’05, pages 1313–1316, Lisbon, Portugal. 2005.
Additional conferences, talks, summer schools
- 29th Student Conference of Linguistics (StuTS), 2001 (Saarbrücken)
- 11th Student Conference of Computational Linguistics (TaCoS), 2002 (Potsdam) (member of local organizing committee)
- EACL 2006 (Trento)
- ESSLLI 2006 (Málaga)
- ESSLLI 2007 (Dublin)
- Potsdam Fall School in Computational Linguistics 2007 (organized by the “Deutsche Gesellschaft für Sprachwissenschaft”)
- Invited talk at CIMeC (Rovereto): CLIC Research Seminar, 28. Feb. 2008. Title: “A task/entity-based context model for answering Follow-Up Questions”
- ESSLLI 2008 (Hamburg): Co-chaired the ESSLLI Student Session (StuS): chair for “Language & Computation” area
The BoB dialogue corpus web-site describes the dialogue data that my PhD thesis and many papers are based on
chatterbot-bob source code our open-source implementation of the BoB chatbot used to collect the dialogues
Internal WiKi (sorry, you’ll need a login for FUB computer science)