Data Mining or Knowledge Discovery in Databases [1,15,23] is currently
one of the most exciting and challenging areas where database techniq
ues are coupled with techniques from Artificial Intelligence and mathe
matical sub-disciplines to great potential advantage. It has been defi
ned as the non-trivial extraction of implicit, previously unknown and
potentially useful information from data. A lot of research effort is
being directed towards building tools for discovering interesting patt
erns which are hidden below the surface in databases. However, most of
the work bring done in this field has been problem-specific and no ge
neral framework has yet been proposed for Data Mining. In this paper w
e seek to remedy this by proposing, EDM - Evidence-based Data Mining -
a general framework for Data Mining based on Evidence Theory. Having
a general framework for Data Mining offers a number of advantages. It
provides a common method for representing knowledge which allows prior
knowledge from the user or knowledge discovered by another discovery
process to be incorporated into the discovery process. A common knowle
dge representation also supports the discovery of meta-knowledge from
knowledge discovered by different Data Mining techniques. Furthermore,
a general framework can provide facilities that are common to most di
scovery processes, e.g. incorporating domain knowledge and dealing wit
h missing values. The framework presented in this paper has the follow
ing additional advantages. The framework is inherently parallel. Thus,
algorithms developed within this framework will also be parallel and
will therefore be expected to be efficient for large data sets - a nec
essity as most commercial data sets, relational or otherwise, are very
large. This is compounded by the fact that the algorithms are complex
. Also, the parallelism within the framework allows its use in paralle
l, distributed and heterogeneous databases. The framework is easily up
dated and new discovery methods can be readily incorporated within the
framework, making it 'general' in the functional sense in addition to
the representational sense considered above. The framework provides a
n intuitive way of dealing with missing data during the discovery proc
ess using the concept of Ignorance borrowed from Evidence Theory. The
framework consists of a method for representing data and knowledge, an
d methods for data manipulation or knowledge discovery(1). We suggest
an extension of the conventional definition of mass functions in Evide
nce Theory for use in Data Mining, as a means to represent evidence of
the existence of rules in the database. The discovery process within
EDM consists of a series of operations on the mass functions. Each ope
ration is carried out by an EDM operator. We provide a classification
for the EDM operators based on the discovery functions performed by th
em and discuss aspects of the induction, domain and combination operat
or classes. The application of EDM to two separate Data Mining tasks i
s also addressed, highlighting the advantages of using a general frame
work for Data Mining in general and, in particular, using one that is
based on Evidence Theory.