In this paper, we present: 1) design of a single-rail energy-efficient 64-b
Han-Carlson ALU, operating at 482 ps in 1.5 V, 0.18-mum bulk CMOS; 2) dire
ct port of this ALU to 0.18-mum partially depleted SOI process; 3) SOI-opti
mal redesign of the ALU using a novel deep-stack quaternary-tree architectu
re; 4) margining for max-delay pushout due to reverse body bias in SOI desi
gns; and 5) performance scaling trends of the ALU designs in 0.13-mum gener
ation. We show that a direct port of the Han-Carlson ALU to 0.18-mum SOI of
fers 14% performance improvement after margining. A redesign of the ALU, us
ing an SOI-favored deep-stack architecture improves the margined speedup to
19%. A 10% margin was required for the SOI designs, to account for reverse
body-bias-induced max-delay pushout. Preconditioning the intermediate stac
k nodes in the dynamic ALU designs reduced this margin to 2%. Scaling the A
LUs to 0.13-mum generation reduces the overall SOI speedup for both archite
ctures to 9% and 16%, respectively, confirming the trend that speedup offer
ed by SOI technology decreases with scaling.