This article describes the implementation of a system that is able to organ
ize vast document collections according to textual similarities. It is base
d on the self-organizing map (SOM) algorithm. As the feature vectors for th
e documents statistical representations of their vocabularies are used. The
main goal in our work: has been to scale up the SOM algorithm to be able t
o deal with large amounts of high-dimensional data. In a practical experime
nt we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the f
eature vectors we used 500-dimensional vectors of stochastic figures obtain
ed as random projections of weighted word histograms.