ITA
ENG

LOOKING IN TEXT WINDOWS - THEIR SIZE AND COMPOSITION

Authors

HAAS SW LOSEE RM

Citation

Sw. Haas et Rm. Losee, LOOKING IN TEXT WINDOWS - THEIR SIZE AND COMPOSITION, Information processing & management, 30(5), 1994, pp. 619-629

Citations number

Categorie Soggetti

Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems

Journal title

Information processing & management → ACNP

ISSN journal

03064573

Volume

Issue

Year of publication

1994

Pages

619 - 629

Database

ISI

SICI code

0306-4573(1994)30:5<619:LITW-T>2.0.ZU;2-0

Abstract

A text window is a group of words appearing in contiguous positions in text. Intuitively, words in such close proximity should have somethin g to do with each other. We can use the text window to exploit a varie ty of lexical, syntactic, and semantic relationships without having to analyze the text explicitly for their structure. This research suppor ts the previously suggested idea that natural groupings of words are b est treated as a unit of size 7 to 11 words, that is, plus or minus th ree to five words. Our text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristics of windows that best match terms in queries are examined in detail, revealing interesting differences betw een those for queries with good results and those for queries with poo rer results. Queries with good results tend to contain more content wo rd phrases and fewer terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-st yle relationships or incorporating statistical dependencies for terms within these windows.