Hc. Chen et al., A CONCEPT SPACE APPROACH TO ADDRESSING THE VOCABULARY PROBLEM IN SCIENTIFIC-INFORMATION RETRIEVAL - AN EXPERIMENT ON THE WORM COMMUNITY SYSTEM, Journal of the American Society for Information Science, 48(1), 1997, pp. 17-31
Citations number
60
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
This research presents an algorithmic approach to addressing the vocab
ulary problem in scientific information retrieval and information shar
ing, using the molecular biology domain as an example. We first presen
t a literature review of cognitive studies related to the vocabulary p
roblem and vocabulary-based search aids(thesauri) and then discuss tec
hniques for building robust and domain-specific thesauri to assist in
cross-domain scientific information retrieval. Using a variation of th
e automatic thesaurus generation techniques, which we refer to as the
concept space approach, we recently conducted an experiment in the mol
ecular biology domain in which we created a C. elegans worm thesaurus
of 7,657 worm-specific terms and a Drosophila fly thesaurus of 15,626
terms. About 30% of these terms overlapped, which created vocabulary p
aths from one subject domain to the other. Based on a cognitive study
of term association involving four biologists, we found that a large p
ercentage (59.6-85.6%) of the terms suggested by the subjects were ide
ntified in the conjoined fly-worm thesaurus. However, we found only a
small percentage( 8.4-18.1%) of the associations suggested by the subj
ects in the thesaurus. In a follow-up document retrieval study involvi
ng eight fly biologists, an actual worm database (Worm Community Syste
m), and the conjoined flyworm thesaurus, subjects were able to find mo
re relevant documents (an increase from about 9 documents to 20) and t
o improve the document recall level (from 32.41 to 65.28%) when using
the thesaurus, although the precision level did not improve significan
tly. Implications of adopting the concept space approach for addressin
g the vocabulary problem in Internet and digital libraries application
s are also discussed.