AN ASSOCIATION-BASED METHOD FOR AUTOMATIC-INDEXING WITH A CONTROLLED VOCABULARY

Citation
C. Plaunt et Ba. Norgard, AN ASSOCIATION-BASED METHOD FOR AUTOMATIC-INDEXING WITH A CONTROLLED VOCABULARY, Journal of the American Society for Information Science, 49(10), 1998, pp. 888-902
Citations number
25
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems","Computer Science Information Systems
ISSN journal
00028231
Volume
49
Issue
10
Year of publication
1998
Pages
888 - 902
Database
ISI
SICI code
0002-8231(1998)49:10<888:AAMFAW>2.0.ZU;2-5
Abstract
in this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues conta ined in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4,626 INSPEC documents, we cr eate a ''dictionary'' of associations between the lexical items contai ned in the titles, authors, and abstracts, and controlled vocabulary s ubject headings assigned to those records by human indexers using a li kelihood ratio statistic as the measure of association. In the deploym ent stage, we use the dictionary to predict which of the controlled vo cabulary subject headings best describe new documents when they are pr esented to the system. Our evaluation of this algorithm, in which we c ompare the automatically assigned subject headings to the subject head ings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human catalogi ng. In effect, we have cast this as a classic partial match informatio n retrieval problem. We consider the problem to be one of ''retrieving '' (or assigning) the most probably ''relevant'' (or correct) controll ed vocabulary subject headings to a document based on the clues contai ned in that document.