ITA
ENG

INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE

Authors

CAN F FOX EA SNAVELY CD FRANCE RK

Citation

F. Can et al., INCREMENTAL CLUSTERING FOR VERY LARGE DOCUMENT DATABASES - INITIAL MARIAN EXPERIENCE, Information sciences, 84(1-2), 1995, pp. 101-114

Citations number

Categorie Soggetti

Information Science & Library Science","Computer Science Information Systems

Journal title

Information sciences → ACNP

ISSN journal

00200255

Volume

Issue

1-2

Year of publication

1995

Pages

101 - 114

Database

ISI

SICI code

0020-0255(1995)84:1-2<101:ICFVLD>2.0.ZU;2-X

Abstract

Clustering of document databases is useful for both browsing and searc hing purposes; however, this can be a prohibitively expensive computat ional process for lai ge collections. This problem is compounded when the clustering structure must reflect a constantly changing database. Therefore, efficient algorithms which maintain an existing clustering structure are desirable, This study provides the details of a large-sc ale implementation of the Cover-Coefficient-based Incremental Clusteri ng Methodology (C(2)ICM). The experiments performed on a sample of the MARIAN database show that its resource requirements are within practi cal bounds for most platforms. Furthermore, C(2)ICM offers considerabl e savings over reclustering. The results of this study will lead to an additional type of browsing and/or searching facility on the Virginia Tech-based MARIAN large online public access library catalog (OPAC) p roject.