Lexical collocations have particular statistical distributions. We hav
e developed a set of statistical techniques for retrieving and identif
ying collocations from large textual corpora. The techniques we develo
ped are able to identify collocations of arbitrary length as well as f
lexible collocations. These techniques have been implemented in a lexi
cographic tool, Xtract, which is able to automatically acquire colloca
tions with high retrieval performance. Xtract works in three stages. T
he first stage is based on a statistical technique for identifying wor
d pairs involved in a syntactic relation. The words can appear in the
text in any order and can be separated by an arbitrary number of other
words. The second stage is based on a technique to extract n-word col
locations (or n-grams) in a much simpler way than related methods. The
se collocations can involve closed class words such as particles and p
repositions. A third stage is then applied to the output of stage one
and applies parsing techniques to sentences involving a given word pai
r in order to identify the proper syntactic relation between the two w
ords. A secondary effect of the third stage is to filter out a number
of candidate collocations as irrelevant and thus produce higher qualit
y output. In this paper we present an overview of Xtract and we descri
be several uses for Xtract and the knowledge it retrieves such as lang
uage generation and machine translation.