A STUDY OF A STATISTICAL-MODEL OF NATURAL-LANGUAGE

Citation
P. Oboyle et al., A STUDY OF A STATISTICAL-MODEL OF NATURAL-LANGUAGE, Irish journal of psychology, 14(3), 1993, pp. 382-396
Citations number
15
Categorie Soggetti
Psychology
Journal title
ISSN journal
03033910
Volume
14
Issue
3
Year of publication
1993
Pages
382 - 396
Database
ISI
SICI code
0303-3910(1993)14:3<382:ASOASO>2.0.ZU;2-Q
Abstract
A statistical model of language is described and shown to be surprisin gly successful in two experiments based on a statistical analysis of t wo text corpora. One experiment trained the model on the domain-specif ic VODIS corpus of 70,000 words, while the other trained it on the Bro wn corpus of 1 million words, containing text from a wide range of dom ains. In each experiment the model was tested using unseen phrases fro m the appropriate corpus and results show that a statistical model can be remarkably successful, even though there is no knowledge of syntax included in the model. Our results also show that the model is most e ffective when trained and tested on the domain-specific VODIS corpus, in spite of its small size. It is noted that the VODIS corpus is a gre at deal smaller than the total amount of language heard by a child in its first few years of life, which suggests that in the restricted dom ain of interest to a child there is more than sufficient sample langua ge to build a successful statistical model containing no knowledge of grammar.