OBJECTIVE: The purpose of this study was to determine whether decision tree
-based methods can be used to predict cesarean delivery.
STUDY DESIGN: This was a historical cohort study of women delivered of live
-born singleton neonates in 1995 through 1997 (22,157). The frequency of ce
sarean delivery was 17%; 78 variables were used for analysis. Decision tree
rule-based methods and logistic regression models were each applied to the
same 50% of the sample to develop the predictive training models and these
models were tested on the remaining 50%.
RESULTS: Decision tree receiver operating characteristic curve areas were a
s follows: nulliparous, 0.82; parous, 0.93. Logistic receiver operating cha
racteristic curve areas were as follows: nulliparous, 0.86; parous, 0.93. D
ecision tree methods and logistic regression methods used similar predictiv
e variables; however, logistic methods required more variables and yielded
less intelligible models. Among the 6 decision tree building methods tested
, the strict minimum message length criterion yielded decision trees that w
ere small yet accurate. Risk factor variables were identified in 676 nullip
arous cesarean deliveries (69%) and 419 parous cesarean deliveries (47.6%).
CONCLUSION: Decision tree models can be used to predict cesarean delivery.
Models built with strict minimum message length decision trees have the fol
lowing attributes: Their performance is comparable to that of logistic regr
ession; they are small enough to be intelligible to physicians; they reveal
causal dependencies among variables not detected by logistic regression; t
hey can handle missing values more easily than can logistic methods; they p
redict cesarean deliveries that lack a categorized risk factor variable.