Traditionally, the performance of a stack machine has been limited by
the true data dependency. A performance enhancement mechanism, stack o
perations folding, was used in Sun Microelectronics' picoJava-I design
, and it can fold up to 60% of all stack operations. The authors use t
he Java bytecode language as the target machine language, and study Ja
va instruction folding on a proposed folding model, the POC model, whi
ch is used to illustrate the theoretical folding operations. Various p
ractical folding strategies based on the POC model are introduced and
evaluated. Statistical data show that the 4-foldable strategy eliminat
es 84% of all stack operations, and the 2-, 3-, and 4-foldable strateg
ies result in overall program speedups of 1.22, 1.32 and 1.34, respect
ively, as compared to a stack machine without folding. Furthermore, th
e 4-foldable strategy is the most practical and cost effective of a Ja
va stack machine design with a decoder width of 8 bytes. Circuit simul
ation results show that a 100MHz 4-foldable folding mechanism can be r
ealized with 0.6 mu m CMOS standard cells, or 240MHz with 0.251 mu m C
MOS technology.