Kruschwitz Intelligent Document Retrieval

Exploiting Markup Structure
1. Auflage 2006
ISBN: 978-1-4020-3768-9
Verlag: Springer Netherland
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, Band 17, 205 Seiten, eBook

Reihe: The Information Retrieval Series

Intelligent Document Retrieval
2005. Auflage 2005, 978-1-4020-3767-2, Buch

Exploiting Markup Structure

E-Book, Englisch, Band 17, 205 Seiten, eBook

Reihe: The Information Retrieval Series

ISBN: 978-1-4020-3768-9
Verlag: Springer Netherland
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

96,29 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

Collections of digital documents can nowadays be found everywhere in institutions, universities or companies. Examples are Web sites or intranets. But searching them for information can still be painful. Searches often return either large numbers of matches or no suitable matches at all. Such document collections can vary a lot in size and how much structure they carry. What they have in common is that they typically do have some structure and that they cover a limited range of topics. The second point is significantly different from the Web in general. The type of search system that we propose in this book can suggest ways of refining or relaxing the query to assist a user in the search process. In order to suggest sensible query modifications we would need to know what the documents are about. Explicit knowledge about the document collection encoded in some electronic form is what we need. However, typically such knowledge is not available. So we construct it automatically.

Kruschwitz Intelligent Document Retrieval jetzt bestellen!

Zielgruppe

Professional/practitioner

Autoren/Hrsg.

Kruschwitz, Udo

Weitere Infos & Material

Inhaltsverzeichnis

Related Work.- Data Analysis and Domain Model Construction.- Incorporating Additional Knowledge.- A Dialogue System for Partially Structured Data.- UKSearch - Intelligent Web Search.- UKSearch - Evaluation and Discussion.- YPA - Searching Classified Directories.- Future Directions and Conclusions.

Leseproben

6 UKSearch - Intelligent Web Search (p.93-94)

Finding information on the Web is normally a straightforward task. For most user requests the information can be located by applying a standard search engine using simple pattern matching techniques. However, by restricting the search to some smaller document collection (one that is still too large to be searched without appropriate tools) this can become a tedious task. Examples of such collections are corporate intranets or university Web sites. Typically a search will return large numbers of matching documents even in smaller document collections. If no matching document can be found, the user is usually either left alone with a great number of partially matching documents or with no results at all.

These are well known problems and approaches for more sophisticated search systems exist to overcome them (see Chap. 2). But those approaches tend to rely very much on a given document structure or expensively created concept hierarchies. While this is appropriate for fairly well structured domains such as product catalogues and other applications where the information is stored in database formats, it is no help if the document collection is heterogeneous.

Surprisingly perhaps, the problem of not .nding any document in the collection for a user query (a form of "data sparsity") is not necessarily a major problem in small domains. The log .les of the search engine installed at the University of Essex Web site prove that the majority of queries that users submit result in a large number of matching documents despite the fairly small size of the collection. But unlike in general Web search where scalability issues prevent the application of more sophisticated indexing steps, we can build domain-speci.c concept hierarchies easily and rapidly in such well-de.ned document collections using the techniques introduced in the earlier chapters. These automatically created knowledge sources re.ect the relations between documents or terms within those documents simply based on the available data.

A part from that, collections of Web pages are well suited to verify the techniques introduced in this book, as these documents are typically marked up using HTML tags. This type of markup mixes visual markup and semantic representation (as found in the meta tags for example). We turn this implicit knowledge into explicit relations.

The earlier chapters presented the conceptual framework. Here we discuss the practical steps that lead to an explicitly structured representation of a Web document collection. Frequently used HTML tags are used to de.ne markup contexts (the fundamental units to extract concepts which are then arranged in a domain model). The structure imposed on the data collection is employed in a dialogue system which assists the user with handling those queries that do not retrieve documents or result in large numbers of matches.

We will see how the general dialogue manager introduced earlier is set up to work with the data collections discussed in this chapter. We will however not focus on the links between concepts and individual documents or directories. The more interesting aspect is the construction of domain models that are not closely tied to the individual documents, mainly because a separable domain model is more .exible. The reason is that despite the ever-changing nature of a collection of Web documents we will not need to constantly update the model. A domain model that is not linked to the individual documents will still be usable once the document collection has been updated. It can simply be plugged into a search system.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

96,29 € (inkl. MwSt.)

sofort verfügbar

Webcode: www2.sack.de/5oss2