Buch, Englisch, 330 Seiten, PB, Format (B × H): 210 mm x 297 mm, Gewicht: 492 g
Reihe: Berichte aus der Informatik
Buch, Englisch, 330 Seiten, PB, Format (B × H): 210 mm x 297 mm, Gewicht: 492 g
Reihe: Berichte aus der Informatik
ISBN: 978-3-8440-2559-0
Verlag: Shaker
This book is about a very active field of research in search applications: automatic question answering (QA). A QA system tries to answer a user’s question by a concise natural language expression based on a given document collection. This work presents a deep approach, i.e. one that builds on semantic representations of questions and documents. These semantic representations are derived by a syntactico-semantic parser. Based on the formal representation, large rule bases and fact bases are applied during the search. Different trade-offs between processing depth and efficiency must be found in such a QA system. The effects of many components from natural language processing and QA are investigated in an extensive ablation study.
The research project in this book aims at a better search engine. It tries to be a felicitous fusion of the state of the art from the following areas: natural language understanding (NLU), question answering, and information retrieval. It should provide more precise results to complex information needs (expressed in natural language questions) than state-of-the-art information retrieval systems and web search engines.
The presented QA approach has the following goals:
1. scale a deep QA approach to large text corpora with millions of articles or documents.
2. develop an efficient matching approach for semantic search allowing inferences with large sets of rules and facts.
3. design a constructive (in the sense of generative) solution that tries to construct answers from several source documents or source collections by techniques like question decomposition, inferential components, deixis resolution, and coreference resolution as opposed to extractive solutions.
4. design a constructive (in the sense of generative) solution that tries to construct answers from several source documents or source collections by techniques like question decomposition, inferential components, deixis resolution, and coreference resolution as opposed to extractive solutions.
In detail, the following strong points compared to other state-of-the-art QA systems come to mind; most of them lie in the depth of natural language processing (NLP) and their large-scale, robust application. This is not to say that any of the following items is absent from all other QA systems. But the combination of all items is a unique characteristic of the presented approach and its implementation in the InSicht system.
1. exploiting paraphrases on many levels like syntax, lexical semantics, sentence semantics, and even text semantics
2. applying inferences in order to find not only explicit, but also implicit information from the corpus
3. working on the word sense level (semantic level or concept level) and not on the word level or even the word form level, i.e. word sense disambiguation (WSD) allows to concentrate on the correct reading of ambiguous words
4. decomposing questions (into subquestions and revised questions), which allows answering complex questions based on several methods of question decomposition
5. general coreference resolver applied to all documents
6. coreference resolver for follow-up questions with nominal or pronominal anaphors to questions or answers
7. temporal deixis resolution for documents
The deep semantic approach in this book and its implementation InSicht try to find a useful compromise between (1) depth of analysis and (2) coverage and efficiency of processing real-world document collections.
The book is intended for people interested in deep approaches to QA. Hence it assumes some basic knowledge of natural language processing, semantics, and semantic network formalisms. But it also includes short introductions and pointers to the literature for these areas.