C. Plaunt et Ba. Norgard, AN ASSOCIATION-BASED METHOD FOR AUTOMATIC-INDEXING WITH A CONTROLLED VOCABULARY, Journal of the American Society for Information Science, 49(10), 1998, pp. 888-902
Citations number
25
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems","Computer Science Information Systems
in this article, we describe and test a two-stage algorithm based on a
lexical collocation technique which maps from the lexical clues conta
ined in a document representation into a controlled vocabulary list of
subject headings. Using a collection of 4,626 INSPEC documents, we cr
eate a ''dictionary'' of associations between the lexical items contai
ned in the titles, authors, and abstracts, and controlled vocabulary s
ubject headings assigned to those records by human indexers using a li
kelihood ratio statistic as the measure of association. In the deploym
ent stage, we use the dictionary to predict which of the controlled vo
cabulary subject headings best describe new documents when they are pr
esented to the system. Our evaluation of this algorithm, in which we c
ompare the automatically assigned subject headings to the subject head
ings assigned to the test documents by human catalogers, shows that we
can obtain results comparable to, and consistent with, human catalogi
ng. In effect, we have cast this as a classic partial match informatio
n retrieval problem. We consider the problem to be one of ''retrieving
'' (or assigning) the most probably ''relevant'' (or correct) controll
ed vocabulary subject headings to a document based on the clues contai
ned in that document.