Conventional wisdom suggests that for small data sets having substanti
al skew, one should attempt to determine the correct distributional fo
rm, if possible, and apply statistical methods appropriate for that di
stribution. Transformations such as the log or square root are often u
sed. If an appropriate distributional form cannot be determined, a dis
tribution-free procedure such as a rank transformation or a randomizat
ion test procedure can be used. To better appreciate the effect of suc
h alternatives on both the type I error and power of detecting differe
nces between treatment groups, simulation studies were conducted for r
esponses having specific gamma G(r, theta) and log-normal In(M, V) dis
tributions. The gamma and log-normal distributions were selected so th
at they had the same first two moments. A simple two group design was
assumed. The reference group always had an average disease level mu =
3.0 (mu = r theta for gamma, mu = M for log-normal), and the treatment
group always had means whose reductions ranged from 0 per cent to 50
per cent. The effect of distributional type and the degree of skewness
was investigated by varying the population parameter values. Six stat
istical test procedures were compared for the gamma distributions. All
test procedures were robust relative to the type I error. The UMP tes
t based on a ratio of sample means produced the greatest power for all
combinations of n, r and R(T) The power losses associated with the ra
ndomization test, the t-test on original scale, and the t-test on the
square root scale were very small, (3 per cent to 6 per cent in absolu
te value) for n = 10 and 15, and less than 2 per cent for group sizes
of 25 or more. The power loss associated with the t-test on the log sc
ale was much larger, ranging from 5 per cent to 10 per cent smaller po
wer than the t-test on original scale. The Wilcoxon rank test produced
similar results to that of the LOG t-test for small samples. The powe
r for the shifted LOG (X + c) test increased monotonically to the asym
ptotic value of the ORIG t-test. The same five test procedures based o
n differences in sample means were then compared for the corresponding
log-normal distributions. The UMP test, that is, LOG(X), produced the
highest power. There was very little power lost for the SORT t-test.
The loss in power varied between 2 per cent and 5 per cent for the RAN
K test. The RANK test performed considerably better than the t-test on
the original scale. In contrast to the results for the gamma the powe
r for the shifted LOG (X + c) test had its maximum for c = 0, and decr
eased monotonically to the asymptotic value of the ORIG t-test. The re
sults suggest that statistical inferences can be highly dependent on t
he distributional form and the scale of measurement of the response us
ed in the statistical analysis.