Rather than presenting a specific trick, this paper aims at providing a met
hodology for large scale, real-world classification tasks involving thousan
ds of classes and millions of training patterns. Such problems arise in spe
ech recognition, handwriting recognition and speaker or writer identificati
on, just to name a few. Given the typically very large number of classes to
be distinguished, many approaches focus on parametric methods to independe
ntly estimate class conditional likelihoods. In contrast, we demonstrate ho
w the principles of modularity and hierarchy can be applied to directly est
imate posterior class probabilities in a connectionist framework. Apart fro
m offering better discrimination capability, we argue that a hierarchical c
lassification scheme is crucial in tackling the above mentioned problems. F
urthermore, we discuss training issues that have to be addressed when an al
most infinite amount of training data is available.