K. Lund et C. Burgess, PRODUCING HIGH-DIMENSIONAL SEMANTIC SPACES FROM LEXICAL COOCCURRENCE, Behavior research methods, instruments, & computers, 28(2), 1996, pp. 203-208
A procedure that processes a corpus of text and produces numeric vecto
rs containing information about its meanings for each word is presente
d. This procedure is applied to a large corpus of natural language tex
t taken from Usenet, and the resulting vectors are examined to determi
ne what information is contained within them. These vectors provide th
e coordinates in a high-dimensional space in which word relationships
can be analyzed. Analyses of both vector similarity and multidimension
al scaling demonstrate that there is significant semantic information
carried in the vectors. A comparison of vector similarity with human r
eaction times in a single-word priming experiment is presented. These
vectors provide the basis for a representational model of semantic mem
ory, hyperspace analogue to language (HAL).