Background: Cost data often are nonnormally distributed due to a few very h
igh cost values that may not necessarily be dismissed as outliers. Research
ers have not reached agreement on how to appropriately deal with skewed cos
t data.
Objectives: This study presents an example of skewed cost data that were co
llected retrospectively from the Texas Medicaid database. Common methods of
dealing with skewed cost distributions are discussed. Data were analyzed u
sing various methods, and the statistical results of each test were compare
d.
Methods: Prescription and medical claims data extracted from the Texas Medi
caid database were analyzed using the Mann-Whitney U test and t tests of un
transformed, log-transformed, and bootstrapped data.
Results: All distributions of the untransformed cost data were nonnormally
distributed, and comparison groups had unequal variances. The Mann-Whitney
U test negated the ef feet of the high-cost patients and gave a significant
result for overall cost differences between groups, but in the opposite di
rection of the mean. The t tests on raw data and log-transformed data may n
ot have been optimal because distributions of both raw costs and log-costs
were nonnormal.
Conclusions: The bootstrap method does not need to meet the assumptions of
normality and equal variances. In analyses of small sample sizes with skewe
d cost data, the bootstrap method may offer an alternative to the more trad
itional nonparametric or log-transformation techniques.