A CONCEPT SPACE APPROACH TO ADDRESSING THE VOCABULARY PROBLEM IN SCIENTIFIC-INFORMATION RETRIEVAL - AN EXPERIMENT ON THE WORM COMMUNITY SYSTEM

Citation
Hc. Chen et al., A CONCEPT SPACE APPROACH TO ADDRESSING THE VOCABULARY PROBLEM IN SCIENTIFIC-INFORMATION RETRIEVAL - AN EXPERIMENT ON THE WORM COMMUNITY SYSTEM, Journal of the American Society for Information Science, 48(1), 1997, pp. 17-31
Citations number
60
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
ISSN journal
00028231
Volume
48
Issue
1
Year of publication
1997
Pages
17 - 31
Database
ISI
SICI code
0002-8231(1997)48:1<17:ACSATA>2.0.ZU;2-D
Abstract
This research presents an algorithmic approach to addressing the vocab ulary problem in scientific information retrieval and information shar ing, using the molecular biology domain as an example. We first presen t a literature review of cognitive studies related to the vocabulary p roblem and vocabulary-based search aids(thesauri) and then discuss tec hniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of th e automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the mol ecular biology domain in which we created a C. elegans worm thesaurus of 7,657 worm-specific terms and a Drosophila fly thesaurus of 15,626 terms. About 30% of these terms overlapped, which created vocabulary p aths from one subject domain to the other. Based on a cognitive study of term association involving four biologists, we found that a large p ercentage (59.6-85.6%) of the terms suggested by the subjects were ide ntified in the conjoined fly-worm thesaurus. However, we found only a small percentage( 8.4-18.1%) of the associations suggested by the subj ects in the thesaurus. In a follow-up document retrieval study involvi ng eight fly biologists, an actual worm database (Worm Community Syste m), and the conjoined flyworm thesaurus, subjects were able to find mo re relevant documents (an increase from about 9 documents to 20) and t o improve the document recall level (from 32.41 to 65.28%) when using the thesaurus, although the precision level did not improve significan tly. Implications of adopting the concept space approach for addressin g the vocabulary problem in Internet and digital libraries application s are also discussed.