Information retrieval addresses the problem of finding those documents whos
e content matches a user's request from among a large collection of documen
ts. Currently, the most successful general purpose retrieval methods are st
atistical methods that treat text as little more than a bag of words. Howev
er, attempts to improve retrieval performance through more sophisticated li
nguistic processing have been largely unsuccessful. Indeed, unless done car
efully, such processing can degrade retrieval effectiveness.
Several factors contribute to the difficulty of improving on a good statist
ical baseline including: the forgiving nature but broad coverage of the typ
ical retrieval task; the lack of good weighting schemes for compound index
terms; and the implicit linguistic processing inherent in the statistical m
ethods. Natural language processing techniques may be more important for re
lated tasks such as question answering or document summarization.