A new computer architecture called the Multithreaded Decoupled Archite
cture has been proposed for exploiting fine-grain parallelism. It deve
lops further some of the ideas of parallel processing implemented in t
he Russian MARS-M computer in the 1980s. The MTD architecture aims at
enhancing both total machine throughput and a single thread performanc
e. To achieve this god, we propose a two-level parallel computation mo
del. Its low level defines the decoupled parallel execution of instruc
tions within program fragments not containing branches. We will be ref
erring to these fragments as basic blocks. The model's high level defi
nes the parallel execution of multiple basic blocks representing a fun
ction or procedure. This scheduling hierarchy reflects the MTD storage
hierarchy. Together the scheduling and storage models allow a process
or with multiple execution units to exploit several forms of paralleli
sm within a procedure. The compiler provides the hardware with thread
register usage masks to allow run-time enforcing of control and data d
ependencies between the high level threads. We present a possible impl
ementation of the MTD-processor with multiple execution units and two-
level distributed register memory.