LOOKING IN TEXT WINDOWS - THEIR SIZE AND COMPOSITION

Authors
Citation
Sw. Haas et Rm. Losee, LOOKING IN TEXT WINDOWS - THEIR SIZE AND COMPOSITION, Information processing & management, 30(5), 1994, pp. 619-629
Citations number
10
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
ISSN journal
03064573
Volume
30
Issue
5
Year of publication
1994
Pages
619 - 629
Database
ISI
SICI code
0306-4573(1994)30:5<619:LITW-T>2.0.ZU;2-0
Abstract
A text window is a group of words appearing in contiguous positions in text. Intuitively, words in such close proximity should have somethin g to do with each other. We can use the text window to exploit a varie ty of lexical, syntactic, and semantic relationships without having to analyze the text explicitly for their structure. This research suppor ts the previously suggested idea that natural groupings of words are b est treated as a unit of size 7 to 11 words, that is, plus or minus th ree to five words. Our text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristics of windows that best match terms in queries are examined in detail, revealing interesting differences betw een those for queries with good results and those for queries with poo rer results. Queries with good results tend to contain more content wo rd phrases and fewer terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-st yle relationships or incorporating statistical dependencies for terms within these windows.