EDM - A GENERAL FRAMEWORK FOR DATA MINING BASED ON EVIDENCE THEORY

Citation
Ss. Anand et al., EDM - A GENERAL FRAMEWORK FOR DATA MINING BASED ON EVIDENCE THEORY, Data & knowledge engineering, 18(3), 1996, pp. 189-223
Citations number
35
Categorie Soggetti
Computer Science Artificial Intelligence","Computer Science Information Systems
ISSN journal
0169023X
Volume
18
Issue
3
Year of publication
1996
Pages
189 - 223
Database
ISI
SICI code
0169-023X(1996)18:3<189:E-AGFF>2.0.ZU;2-C
Abstract
Data Mining or Knowledge Discovery in Databases [1,15,23] is currently one of the most exciting and challenging areas where database techniq ues are coupled with techniques from Artificial Intelligence and mathe matical sub-disciplines to great potential advantage. It has been defi ned as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patt erns which are hidden below the surface in databases. However, most of the work bring done in this field has been problem-specific and no ge neral framework has yet been proposed for Data Mining. In this paper w e seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowle dge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most di scovery processes, e.g. incorporating domain knowledge and dealing wit h missing values. The framework presented in this paper has the follow ing additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets - a nec essity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex . Also, the parallelism within the framework allows its use in paralle l, distributed and heterogeneous databases. The framework is easily up dated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides a n intuitive way of dealing with missing data during the discovery proc ess using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, an d methods for data manipulation or knowledge discovery(1). We suggest an extension of the conventional definition of mass functions in Evide nce Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each ope ration is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by th em and discuss aspects of the induction, domain and combination operat or classes. The application of EDM to two separate Data Mining tasks i s also addressed, highlighting the advantages of using a general frame work for Data Mining in general and, in particular, using one that is based on Evidence Theory.