ITA
ENG

Self organization of a massive document collection

Authors

Kohonen, T Kaski, S Lagus, K Salojarvi, J Honkela, J Paatero, V Saarela, A

Citation

T. Kohonen et al., Self organization of a massive document collection, IEEE NEURAL, 11(3), 2000, pp. 574-585

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

IEEE TRANSACTIONS ON NEURAL NETWORKS

ISSN journal

10459227 → ACNP

Volume

Issue

Year of publication

2000

Pages

574 - 585

Database

ISI

SICI code

1045-9227(200005)11:3<574:SOOAMD>2.0.ZU;2-U

Abstract

This article describes the implementation of a system that is able to organ ize vast document collections according to textual similarities. It is base d on the self-organizing map (SOM) algorithm. As the feature vectors for th e documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able t o deal with large amounts of high-dimensional data. In a practical experime nt we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the f eature vectors we used 500-dimensional vectors of stochastic figures obtain ed as random projections of weighted word histograms.