We discuss the use of probability-based natural language processing for Chi
nese text retrieval. We focus on comparing different text extraction method
s and probabilistic weighting methods. Several document processing methods
and probabilistic weighting functions are presented. A number of experiment
s have been conducted on large standard text collections. We present the ex
perimental results that compare a word-based text processing method with a
character-based method. The experimental results also compare a number of t
erm-weighting functions including both single-unit weighting and compound-u
nit weighting functions.