The two-sample Wilcoxon rank sum test is the most popular non-parametr
ic test for the comparison of two samples when the underlying distribu
tions are not normal. Although the underlying distributions need not b
e known in detail to calculate the null distribution of the test stati
stic, parametric assumptions are often made to determine the power of
the test or the sample size. We encountered difficulties with this app
roach in the planning of a recent clinical trial in stroke patients. I
t is shown that, for power and sample size estimation, it can be dange
rous to apply the classical formulae routinely, especially with outcom
e scores having a U-shaped or a J-shaped distribution. As an example w
e have taken the Barthel index, a quality-of-life outcome measure in s
troke patients. Further, we have investigated alternative methods by m
eans of Monte Carlo simulation. The distributional characteristics of
the estimated powers were compared. Our findings suggest more appropri
ate computer software is necessary for the calculation of power and sa
mple size when efficacy is measured by a non-parametric method.