Statistical language models estimate the distribution of natural language f
or the purpose of improving various language technology applications. Ironi
cally, the most successful models of this type take little advantage of the
nature of language. I review the extent to which various aspects of natura
l language are captured in current models. I then describe a general framew
ork, recently developed at our laboratory, for incorporating arbitrary ling
uistic structure into a statistical framework, and present a methodology fo
r eliciting linguistic features currently missing from the model. Finally,
I ponder our failure heretofore to integrate linguistic theories into a sta
tistical framework, and suggest possible reasons for it.