Extracting patterns and models of interest from targe databases is attracti
ng much attention in a variety of disciplines. Knowledge discovery in datab
ases (KDD) and data mining (DM) are areas of common interest to researchers
in machine learning, pattern recognition, statistics, artificial intellige
nce, and high performance computing. An effective and robust method, coined
regression-class mixture decomposition (RCMD) method, is proposed in this
paper for the mining of regression classes in large data sets, especially t
hose contaminated by noise. A new concept, called "regression class" which
is defined as a subset of the data set that is subject to a regression mode
l, is proposed as a basic building block on which the mining process is bas
ed. A large data set is treated as a mixture population in which there are
many such regression classes and others not accounted for by the regression
models. Iterative and genetic-based algorithms for the optimization of the
objective function in the RCMD method are also constructed. It is demonstr
ated that the RCMD method can resist a very large proportion of noisy data.
identify each regression class. assign an inlier set of data points suppor
ting each identified regression class, and determine the a priori unknown n
umber of statistically valid models in the data set. Although the models ar
e extracted sequentially, the final result is almost independent of the ext
raction order due to a novel dynamic classification strategy employed in th
e handling of overlapping regression classes. The effectiveness and robustn
ess of the RCMD method are substantiated by a set of simulation experiments
and a real-life application showing the way it can be used to fit mixed da
ta to linear regression classes and nonlinear structures in various situati
ons.