Em. Schwarz et al., CMOS FLOATING-POINT UNIT FOR THE S 390 PARALLEL ENTERPRISE SERVER G4/, IBM journal of research and development, 41(4-5), 1997, pp. 475-488
The S/390(R) floating-point unit (FPU) on the fourth-generation (G4) C
MOS microprocessor chip has been implemented in a CMOS technology with
a 0.20-mu m effective channel length and has been demonstrated at mor
e than 400 MHz, The microprocessor chip is 17.35 by 17.30 mm in size,
and one copy of the FPU including the dataflow and control flow but no
t including the FPR register file is 5.3 by 4.7 mm in size, There are
two copies on the chip for error-detection purposes only; both copies
execute the same instruction stream and are checked against each other
, The highperformance implementation has a throughput of one instructi
on per cycle and an average latency of three execution cycles, yieldin
g approximately 70 MFLOPS at 300 MHz on the Linpack benchmark. Current
ly, the G4 FPU is the highest-performance S/390 CMOS FPU with fault to
lerance, It uses several innovative and high-performance algorithms no
t commonly found in S/390 FPUs or other FPUs, such as a radix-8 Booth
multiplier, a Goldschmidt division and square-root algorithm, techniqu
es for updating the exponent in parallel with normalization, and avoid
ance of the remainder comparison in quadratically converging division
and square-root algorithms. Also demonstrated is a practical design te
chnique for designing control flow into the dataflow and early floorpl
anning techniques.