In an earlier paper, me proved a strong version of the redundancy-capa
city converse theorem of universal coding, stating that for ''most'' s
ources in a given class, the universal coding redundancy is essentiall
y lower-bounded by the capacity of the channel induced by this class.
Since this result holds for general classes of sources, it extends Ris
sanen's strong converse theorem for parametric families. While our ear
lier result has established strong optimality only for mixture codes w
eighted by the capacity-achieving prior, our first result herein exten
ds this finding to a general prior. For some cases our technique also
leads to a simplified proof of the above mentioned strong converse the
orem. The major interest in this paper, however, is in extending the t
heory of universal coding to hierarchical structures of classes, where
each class may have a different capacity. In this setting, one wishes
to incur redundancy essentially as small as that corresponding to the
active class, and not the union of classes. Our main result is that t
he redundancy of a code based on a two-stage mixture (first, within ea
ch class, and then over the classes), is no worse than that of any oth
er code for ''most'' sources of ''most'' classes. If,in addition, the
classes can be efficiently distinguished by a certain decision rule, t
hen the best: attainable redundancy is given explicitly by the capacit
y of the active class plus the normalized negative logarithm of the pr
ior probability assigned to this class. These results suggest some int
eresting guidelines as for the choice of the prior. We also discuss so
me examples with a natural hierarchical partition into classes.