The effect of the structure of the input distribution on the complexit
y of learning a pattern classification task is investigated. Using sta
tistical mechanics, we study the performance of a winner-take-all mach
ine at learning to classify points generated by a mixture of K Gaussia
n distributions (''clusters'') in R(N) with intercluster distance u (r
elative to the cluster width). In the separation limit u >> 1, the num
ber of examples required for learning scales as NKu(-p), where the exp
onent p is 2 for zero-temperature Gibbs learning and 4 for the Hebb ru
le.