Noise detection and elimination in data preprocessing: Experiments in medical domains

Citation
D. Gamberger et al., Noise detection and elimination in data preprocessing: Experiments in medical domains, APPL ARTIF, 14(2), 2000, pp. 205-223
Citations number
28
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
APPLIED ARTIFICIAL INTELLIGENCE
ISSN journal
08839514 → ACNP
Volume
14
Issue
2
Year of publication
2000
Pages
205 - 223
Database
ISI
SICI code
0883-9514(200002)14:2<205:NDAEID>2.0.ZU;2-C
Abstract
Compression measures used in inductive learners, such as measures based on the minimum description length principle, can be used as a basis for gradin g candidate hypotheses. Compression-based induction is suited also for hand ling noisy data. This paper shows that a simple compression measure can be used to detect noisy training examples, where noise is due to random classi fication errors. A technique is proposed in which noisy examples are detect ed and eliminated from the training set, and a hypothesis is then built fro m the set of remaining examples. This noise elimination method was applied to preprocess data for four machine-learning algorithms, and evaluated on s elected medical domains.