INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

Citation
F. Can et al., INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE, Information sciences, 84(1-2), 1995, pp. 101-114
Citations number
16
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems
Journal title
ISSN journal
00200255
Volume
84
Issue
1-2
Year of publication
1995
Pages
101 - 114
Database
ISI
SICI code
0020-0255(1995)84:1-2<101:ICFVLD>2.0.ZU;2-X
Abstract
Clustering of document databases is useful for both browsing and searc hing purposes; however, this can be a prohibitively expensive computat ional process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-sc ale implementation of the Cover-Coefficient-based Incremental Clusteri ng Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practi cal bounds for most platforms. Furthermore, C(2)ICM offers considerabl e savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) p roject.