Jw. Grzymala-busse et al., Coping with missing attribute values based on closest fit in preterm birthdata: A rough set approach, COMPUT INTE, 17(3), 2001, pp. 425-434
Data mining is frequently applied to data sets with missing attribute value
s. A new approach to missing attribute values, called closest fit, is intro
duced in this paper. In this approach, for a given case (example) with a mi
ssing attribute value we search for another case that is as similar as poss
ible to the given case. Cases can be considered as vectors of attribute val
ues. The search is for the case that has as many as possible identical attr
ibute values for symbolic attributes, or as the smallest possible value dif
ferences for numerical attributes. There are two possible ways to conduct a
search: within the same class (concept) as the case with the missing attri
bute values, or for the entire set of all cases. For comparison, we also ex
perimented with another approach to missing attribute values, where the mis
sing values are replaced by the most common value of the attribute for symb
olic attributes or by the average value for numerical attributes. All algor
ithms were implemented in the system OOMIS. Our experiments were performed
on the preterm birth data sets provided by the Duke University Medical Cent
er.