We present an efficient document clustering algorithm that uses a term freq
uency vector for each document instead of using a huge proximity matrix. Th
e algorithm has the following features: (1) it requires a relatively small
amount of memory and runs fast, (2) it produces a hierarchy in the form of
a document classification tree and (3) the hierarchy obtained by the algori
thm explicitly reveals a collection structure. We confirm these features an
d thus show the algorithm's feasibility through clustering experiments in w
hich we use two collections of Japanese documents, the sizes of which are 8
3,099 and 14,701 documents. We also introduce an application of this algori
thm to a document browser. This browser is used in our Japanese-to-English
translation aid system. The browsing module of the system consists of a hug
e database of Japanese news articles and their English translations. The Ja
panese article collection is clustered into a hierarchy by our method. Sinc
e each node in the hierarchy corresponds to a topic in the collection, we c
an use the hierarchy to directly access articles by topic. A user can learn
general translation knowledge of each topic by browsing the Japanese artic
les and their English translations. We also discuss techniques of presentin
g a large tree-formed hierarchy on a computer screen. (C) 1999 Elsevier Sci
ence Ltd. All rights reserved.