Knowledge discovery from the dramatically increased data of an auto-stored
medical information system is still in its infancy. The purpose of this stu
dy is to use widely available and easily operated techniques that can satis
fy general users in extracting specific knowledge to make the medical infor
mation system more functional. Data mining techniques, including data visua
lisation, correlation analysis, discriminant analysis, and neural networks
supervised classification, were applied to heart disease databases. These t
echniques can help to identify high risk patients, define the most importan
t factors (variables) in heart disease, and build a multivariate relationsh
ip model to show the relationship between any two variables in a way that s
uch relationships are easy to view. Simple visualization techniques were ut
ilised to construct this model, which corresponds with current medical know
ledge. Two nonparametric (distribution assumption fret) classification tool
s were employed to identify high risk heart disease patients. Both the neur
al networks supervised classification methods and thr discriminant analysis
method produced reliable classification rates for heart disease patients.
However, neural networks yielded a higher percentage of correct classificat
ions (averaging 89%) than discriminant analysis (79%). Data visualisation a
nd correlation anal! sis resulted in similar conclusions regarding the most
important factors in heart disease. These data milling tools provide simpl
e and effective methods of extracting knowledge from general medical inform
ation. The treatment of missing data is also discussed.