E-Book, Englisch, 272 Seiten
de Keyser Indexing
1. Auflage 2012
ISBN: 978-1-78063-341-1
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
From Thesauri to the Semantic Web
E-Book, Englisch, 272 Seiten
Reihe: Chandos Information Professional Series
ISBN: 978-1-78063-341-1
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Indexing consists of both novel and more traditional techniques. Cutting-edge indexing techniques, such as automatic indexing, ontologies, and topic maps, were developed independently of older techniques such as thesauri, but it is now recognized that these older methods also hold expertise.Indexing describes various traditional and novel indexing techniques, giving information professionals and students of library and information sciences a broad and comprehensible introduction to indexing. This title consists of twelve chapters: an Introduction to subject readings and theasauri; Automatic indexing versus manual indexing; Techniques applied in automatic indexing of text material; Automatic indexing of images; The black art of indexing moving images; Automatic indexing of music; Taxonomies and ontologies; Metadata formats and indexing; Tagging; Topic maps; Indexing the web; and The Semantic Web. - Makes difficult and complex techniques understandable - Contains may links to and illustrations from websites where new indexing techniques can be experienced - Provides references for further reading
Piet de Keyser is head librarian of the Katholieke Hogeschool Leuven, an institute for higher education in Louvain, Belgium. He published many articles on literary history, philosophy and library sciences. He teaches indexing in a Belgian Library and Information Sciences school.
Autoren/Hrsg.
Weitere Infos & Material
2 Automatic indexing versus manual indexing
Abstract:
This chapter gives an overview of the arguments used in the discussion between the supporters of manual indexing and those of automatic indexing. The arguments against manual indexing are that it is slow, expensive, not detailed enough, that it does not lead to better retrieval, that it is outdated and document centred and that there is no consistency between indexers. The arguments against automatic indexing are that it does not provide an overview of the index terms, that it does not solve the problem of synonyms and variants, that it does not take the context into account, that it does not allow browsing related terms, that orthography may be an impediment and, finally, that it is too complex for computers. The end of the chapter gives an overview of the six most popular misconceptions about automatic indexing. Key words manual indexing automatic indexing Machine indexing is rotten human indexing is capricious. (Masse Bloomfield [1]) Introduction
Librarians still consider it to be part of their core business to maintain and apply classifications, subject headings and thesauri. They are trained in library schools to use them, they write about them in journals or discuss them in conferences. In their opinion, books or articles simply cannot be found in a library without their skilled work of indexing. More and more this is no longer self-evident. Even professionals ask if we can afford manual indexing and if we should not devote our time and money to other activities, although their doubts may provoke fierce reactions from colleagues. The discussion about manual indexing is also one about controlled vocabularies, because manual indexing is normally done by means of thesauri or subject headings. This chapter will give the arguments of defenders and opponents of manual indexing by means of a controlled vocabulary – and, as a consequence, those of the defenders and opponents of automatic indexing, which has been, up to now, still the main alternative. Manual indexing by non-professionals, authors or readers, can also be a competitor to professional indexing. Some aspects of this will also be discussed in this chapter, although Chapter 9, which deals with tagging, will go into more detail. Arguments against manual indexing
Manual indexing is slow
In his classic book Everything is Miscellaneous, David Weinberger describes the daily challenge the cataloguing section in the Library of Congress faces every single day: Every day, more books come into the library than the 6,487 volumes Thomas Jefferson donated in 1815 to kick-start the collection after the British burned the place down. The incoming books are quickly sorted into cardboard boxes by topic. The boxes are delivered to three to four hundred catalogers, who between them represent eighty different subject specializations. They examine each book to see which of the library’s 285,000 subject headings is most appropriate. Books can be assigned up to ten different subject headings. Keeping America’s books non-miscellaneous is a big job [2]. Backlogs in libraries not always are the result of the overwhelming amount of new arrivals in the cataloguing section; they may be due to many other factors: limited budgets, which can cause understaffing, time-consuming cataloguing practices, etc. Indexing can indeed play a role in this too. It takes at least a few minutes for a cataloguer to find out what the exact subject of a book is and which thesaurus terms are the best translation of that subject. If a cataloguer needs five minutes to index a publication and another 15 to create a new catalogue record, he could save a quarter of his time if indexing terms were added automatically. These kinds of calculations may be appealing, especially to managers, when budgets are cut down. Manual indexing is expensive
Because of the amount of time it takes to find suitable indexing terms, indexing will cost a lot of money. The only way to make it cheaper – or even affordable in some cases – is to outsource it to countries where highly trained professionals do the work for substantially lower wages. A simple search for ‘outsourcing indexing India’ in Google will reveal all kinds of firms who offer these (and other) services. Of course, this will only be an alternative in English-speaking countries. But it is also expensive to create, update and learn a controlled vocabulary. It can take years for a team of librarians and specialists to create a good thesaurus. At least one colleague must devote part of his or her time to updating the thesaurus and he or she will have to consult other colleagues or specialists from time to time to ask for their advice. New colleagues will need up to six months to get fully acquainted with the controlled vocabulary. In a report, commissioned in 2006 by the Library of Congress, Karen Calhoun made many suggestions to reduce costs. One of them was: ‘Abandon the attempt to do comprehensive subject analysis manually with LCSH in favour of subject keywords; urge LC to dismantle LCSH’ [3]. This suggestion has raised many protests from the library world. Abandoning LCSH would lead to ‘scholarly catastrophe’ and ‘bibliobarbarism’ [4]. Of course, Thomas Mann (not the German writer, but the researcher at the Library of Congress), who is a defender of traditional indexing systems like the Library of Congress Subject Headings, does not agree with Calhoun’s views. In a critical review of her report [5] he argues against replacing controlled vocabulary with keywords. His arguments are: A controlled vocabulary leads to a whole variety of titles that could not be found using only keywords. Browsing through the LCSH terms ‘brings aspects of the subject to the attention of researchers that they would not think, beforehand, even to exist’. In the end, Calhoun’s argument is that we can do with less, but with still enough quality, if we want to reduce the costs; Mann’s point is that quality is paramount, whatever the cost may be. Manual indexing is not detailed enough
Because indexing is expensive and slow, libraries mostly have a policy of global indexing. This means that a book containing essays on ten twentieth-century writers will only get one indexing term; it will be treated as a book on twentieth-century literature. A reader interested in one out of the ten novelists must be clever enough to go through the books indexed as ‘English literature–Twentieth century’ too. The Library of Congress has several rules that apply to cases where more than one subject is treated in a book. They are called the ‘rule of three’ and the ‘rule of four’. The ‘rule of three’ stipulates the following: ‘For a work covering two or three topics treated separately, a heading representing precisely each of the topics is assigned. The two or three specific headings are assigned in favor of a general heading if the latter includes in its scope more than three subtopics’ [6]. The ‘rule of four’ is applied in cases ‘when the work being cataloged deals with four topics, each of which forms only a small portion of a general topic’ [7]. For such a work four headings may be assigned, but in other cases a work with four topics may get a more general heading; five is always one too many. If you are a secondary school pupil who has to write an essay on an author, you not only should know that you must look under a more general subject too, but also that you must consult a specialized database in order to find all (chapters of) books in the library dealing with your author. This may be asking too much from someone who is not an experienced researcher, and as a consequence the library probably does not fulfil the needs of the most interested public for this kind of literature. Manual indexing does not necessarily lead to better retrieval
Some research has been conducted in order to find out if manual indexing increases search results. At least in library catalogues, indexing leads to better retrieval: more than 30 per cent of the records cannot be found if only keywords and no descriptors are used in the search [8]. The authors of these studies recognize that adding tables of content to the catalogue records creates a new situation, for which their conclusions must be re-examined, but they do not tell anything about the effects on retrieval when controlled vocabulary terms are added to full text documents, where every word can be used as a search term. Other research pointed out that adding subject heading to fiction does not increase circulation, at least not in academic libraries [9]. Comparing searches based on automatic text-word indexing or manually assigned controlled descriptors, J. Savoy found that the searches in an abstract database using controlled descriptors were slightly better, but the differences were statistically insignificant. The best results were obtained using a combination of keywords and controlled...