A distributed framework for parallel data mining using HPJava

Authors
Citation
Of. Rana et D. Fisk, A distributed framework for parallel data mining using HPJava, BT TECHNOL, 17(3), 1999, pp. 146-154
Citations number
19
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
BT TECHNOLOGY JOURNAL
ISSN journal
13583948 → ACNP
Volume
17
Issue
3
Year of publication
1999
Pages
146 - 154
Database
ISI
SICI code
1358-3948(199907)17:3<146:ADFFPD>2.0.ZU;2-V
Abstract
Java has become a language of choice for applications executing in heteroge neous environments utilising distributed objects and multithreading. To han dle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algo rithms. Conventional Java implementations do not directly provide support f or the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parall elism and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discusse d for overcoming limitations in the Java Virtual Machine. The framework sup ports parallelism across workstation clusters, using the message passing in terface as middleware, and can support different analysis algorithms, wrapp ed as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.