An important problem in the indexing of natural language text is how to ide
ntify those words and phrases that reflect the content of the text. In gene
ral, automatic indexing has dealt with this problem by removing instances o
f a few hundred common words known as stop words, and treating the remainin
g words as though they were content bearing. This approach is acceptable fo
r some applications such as statistical estimates of the similarity of quer
ies and documents for the purpose of document retrieval. However, when the
indexing terms are to be examined by a human as a means of accessing the li
terature, it greatly improves efficiency if most of the noncontent-bearing
words and phrases can be eliminated from the indexing, Here we present thre
e statistical techniques for identifying content-bearing phrases within a n
atural language database. We demonstrate the effectiveness of the methods o
n test data, and show how all three methods can be combined to produce a si
ngle improved method.