Bayesian classification using a noninformative prior and mislabeled training data

Citation
Rs. Lynch et Pk. Willett, Bayesian classification using a noninformative prior and mislabeled training data, J FRANKL I, 336(5), 1999, pp. 809-819
Citations number
11
Categorie Soggetti
Mechanical Engineering
Journal title
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS
ISSN journal
00160032 → ACNP
Volume
336
Issue
5
Year of publication
1999
Pages
809 - 819
Database
ISI
SICI code
0016-0032(199907)336:5<809:BCUANP>2.0.ZU;2-8
Abstract
The average probability of error is used to demonstrate the performance of a Bayesian classification test (referred to as the Combined Bayes Test (CBT )) when the training data of each class are mislabeled. The CBT combines th e information in discrete training and test data to infer symbol probabilit ies, where a uniform Dirichlet prior (i.e., a noninformative prior of compl ete ignorance) is assumed for all classes. Using the CBT, classification pe rformance is shown to degrade when mislabeling exists in the training data, and this occurs with a severity that depends upon the mislabeling probabil ities. With this, it is shown that as the mislabeling probabilities increas e M*, which is the best quantization fineness related to the Hughes phenome non of pattern recognition, also increases. Notice, that even when the actu al mislabeling probabilities are known by the CBT it is not possible to ach ieve the classification performance obtainable without mislabeling. However , the negative effect of mislabeling can be diminished, with more success f or smaller mislabeling probabilities, if a data reduction method called the Bayesian Data Reduction Algorithm (BDRA) is applied to the training data. Published by Elsevier Science Ltd.