TOWARD THE AUTOMATIC IDENTIFICATION OF SUBLANGUAGE VOCABULARY

Authors
Citation
Sw. Haas et S. He, TOWARD THE AUTOMATIC IDENTIFICATION OF SUBLANGUAGE VOCABULARY, Information processing & management, 29(6), 1993, pp. 721-732
Citations number
10
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Applications & Cybernetics
ISSN journal
03064573
Volume
29
Issue
6
Year of publication
1993
Pages
721 - 732
Database
ISI
SICI code
0306-4573(1993)29:6<721:TTAIOS>2.0.ZU;2-Z
Abstract
A sublanguage is the language used in a restricted or specialized doma in or field, such as computer science. Information about the vocabular y and structure of a sublanguage is used in any domain-related natural language processing application; however, such information is very ti me-consuming to gather, and much of it must be found and organized man ually. Additionally, information retrieval strategies using lexical in formation depend on finding the appropriate dictionary entry for gener al and technical words. The ability to automatically identify terms be longing to a sublanguage could aid in these and other applications. In this paper, a simple but effective method is developed for automatic identification of sublanguage vocabulary words as they occur in abstra cts. This procedure may significantly reduce the effort required to ex tract sublanguage vocabulary for sublanguage analysis and other applic ations, such as information retrieval. First, the sublanguage vocabula ry identification procedures are described using abstracts from comput er science and library and information science as the sublanguage sour ces. The results of these experiments are evaluated using three differ ent criteria. Finally, the practical and theoretical significance of t his research is discussed along with plans for further experiments on the vocabulary and structure of sublanguages.