Multithreaded processors support a number of execution contexts, and s
witch contexts rapidly in order to tolerate highly latent events such
as external memory references. Existing multithreaded architectures ar
e implicitly based on the assumption that latency tolerance requires m
assive parallelism, which must be found from diverse contexts. The aut
hors have carried out a quantitative analysis of the efficiency of mul
tithreaded execution as a function of the number of threads for two im
portant classes of memory systems: conventional off-chip memory and sy
mmetric networks. The results of these analyses show that there are fu
ndamental reasons for the efficiency to grow very rapidly with the num
ber of threads. This, in turn, implies that the original goal of laten
cy tolerance can be achieved with only-a limited number of threads; th
ese can typically be drawn from the same referential context and do no
t therefore require the heavyweight hardware solutions of conventional
multithreading. A novel dynamically scheduled RISC architecture, base
d on this new understanding of the problem is presented.