A statistical model of language is described and shown to be surprisin
gly successful in two experiments based on a statistical analysis of t
wo text corpora. One experiment trained the model on the domain-specif
ic VODIS corpus of 70,000 words, while the other trained it on the Bro
wn corpus of 1 million words, containing text from a wide range of dom
ains. In each experiment the model was tested using unseen phrases fro
m the appropriate corpus and results show that a statistical model can
be remarkably successful, even though there is no knowledge of syntax
included in the model. Our results also show that the model is most e
ffective when trained and tested on the domain-specific VODIS corpus,
in spite of its small size. It is noted that the VODIS corpus is a gre
at deal smaller than the total amount of language heard by a child in
its first few years of life, which suggests that in the restricted dom
ain of interest to a child there is more than sufficient sample langua
ge to build a successful statistical model containing no knowledge of
grammar.