Background: Preterm births in the United States increased from 11.0% to 11.
4% between 1996 and 1997; they continue to be a complex healthcare problem
in the United States.
Objective: The objective of this research was to compare traditional statis
tical methods with emerging new methods called data mining or knowledge dis
covery in databases in identifying accurate predictors of preterm births.
Method: An ethnically diverse sample (N = 19,970) of pregnant women provide
d data (1,622 variables) for new methods of analysis. Preterm birth predict
ors were evaluated using traditional statistical and newer data mining anal
yses.
Results: Seven demographic variables (maternal age and binary coding for co
unty of residence, education, marital status, payer source, race, and relig
ion) yielded a .72 area under the curve using Receiving Operating Character
istic curves to test predictive accuracy. The addition of hundreds of other
variables added only a .03 to the area under the curve.
Conclusion: Similar results across data mining methods suggest that results
are data-driven and not method-dependent, and that demographic variables o
ffer a small set of parsimonious variables with reasonable accuracy in pred
icting preterm birth outcomes in a racially diverse population.