ITA
ENG

Interfacing knowledge discovery algorithms to large database management systems

Authors

Lavington, S Dewhurst, N Wilkins, E Freitas, A

Citation

S. Lavington et al., Interfacing knowledge discovery algorithms to large database management systems, INF SOFTW T, 41(9), 1999, pp. 605-617

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

INFORMATION AND SOFTWARE TECHNOLOGY

ISSN journal

09505849 → ACNP

Volume

Issue

Year of publication

1999

Pages

605 - 617

Database

ISI

SICI code

0950-5849(19990625)41:9<605:IKDATL>2.0.ZU;2-E

Abstract

The efficient mining of large, commercially credible, databases requires a solution to at least two problems: (a) better integration between existing Knowledge Discovery algorithms and popular DBMS; (b) ability to exploit opp ortunities for computational speedup such as data parallelism. Both problem s need to be addressed in a generic manner, since the stated requirements o f end-users cover a range of data mining paradigms, DBMS, and (parallel) pl atforms. In this paper we present a family of generic, set-based, primitive operations for Knowledge Discovery in Databases (KDD). We show how a numbe r of well-known KDD classification metrics, drawn from paradigms such as Ba yesian classifiers, Rule-Induction/Decision Tree algorithms, Instance-Based Learning methods, and Genetic Programming, can all be computed via our gen eric primitives. We then show how these primitives may be mapped into SQL a nd, where appropriate, optimised for good performance in respect of practic al factors such as client-server communication overheads. We demonstrate ho w our primitives can support C4.5, a widely-used rule induction system. Per formance evaluation figures are presented for commercially available parall el platforms, such as the IBM SP/2. (C) 1999 Elsevier Science B.V. All righ ts reserved.