ITA
ENG

A distributed framework for parallel data mining using HPJava

Authors

Rana, OF Fisk, D

Citation

Of. Rana et D. Fisk, A distributed framework for parallel data mining using HPJava, BT TECHNOL, 17(3), 1999, pp. 146-154

Citations number

Categorie Soggetti

Information Tecnology & Communication Systems

Journal title

BT TECHNOLOGY JOURNAL

ISSN journal

13583948 → ACNP

Volume

Issue

Year of publication

1999

Pages

146 - 154

Database

ISI

SICI code

1358-3948(199907)17:3<146:ADFFPD>2.0.ZU;2-V

Abstract

Java has become a language of choice for applications executing in heteroge neous environments utilising distributed objects and multithreading. To han dle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algo rithms. Conventional Java implementations do not directly provide support f or the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parall elism and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discusse d for overcoming limitations in the Java Virtual Machine. The framework sup ports parallelism across workstation clusters, using the message passing in terface as middleware, and can support different analysis algorithms, wrapp ed as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.