Interfacing knowledge discovery algorithms to large database management systems

Citation
S. Lavington et al., Interfacing knowledge discovery algorithms to large database management systems, INF SOFTW T, 41(9), 1999, pp. 605-617
Citations number
19
Categorie Soggetti
Computer Science & Engineering
Journal title
INFORMATION AND SOFTWARE TECHNOLOGY
ISSN journal
09505849 → ACNP
Volume
41
Issue
9
Year of publication
1999
Pages
605 - 617
Database
ISI
SICI code
0950-5849(19990625)41:9<605:IKDATL>2.0.ZU;2-E
Abstract
The efficient mining of large, commercially credible, databases requires a solution to at least two problems: (a) better integration between existing Knowledge Discovery algorithms and popular DBMS; (b) ability to exploit opp ortunities for computational speedup such as data parallelism. Both problem s need to be addressed in a generic manner, since the stated requirements o f end-users cover a range of data mining paradigms, DBMS, and (parallel) pl atforms. In this paper we present a family of generic, set-based, primitive operations for Knowledge Discovery in Databases (KDD). We show how a numbe r of well-known KDD classification metrics, drawn from paradigms such as Ba yesian classifiers, Rule-Induction/Decision Tree algorithms, Instance-Based Learning methods, and Genetic Programming, can all be computed via our gen eric primitives. We then show how these primitives may be mapped into SQL a nd, where appropriate, optimised for good performance in respect of practic al factors such as client-server communication overheads. We demonstrate ho w our primitives can support C4.5, a widely-used rule induction system. Per formance evaluation figures are presented for commercially available parall el platforms, such as the IBM SP/2. (C) 1999 Elsevier Science B.V. All righ ts reserved.