Self organization of a massive document collection

Citation
T. Kohonen et al., Self organization of a massive document collection, IEEE NEURAL, 11(3), 2000, pp. 574-585
Citations number
51
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
IEEE TRANSACTIONS ON NEURAL NETWORKS
ISSN journal
10459227 → ACNP
Volume
11
Issue
3
Year of publication
2000
Pages
574 - 585
Database
ISI
SICI code
1045-9227(200005)11:3<574:SOOAMD>2.0.ZU;2-U
Abstract
This article describes the implementation of a system that is able to organ ize vast document collections according to textual similarities. It is base d on the self-organizing map (SOM) algorithm. As the feature vectors for th e documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able t o deal with large amounts of high-dimensional data. In a practical experime nt we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the f eature vectors we used 500-dimensional vectors of stochastic figures obtain ed as random projections of weighted word histograms.