Floating-point division is generally regarded as a low frequency, high
latency operation in typical floating-point applications. However, in
the worst case, a high latency hardware floating-point divider can co
ntribute an additional 0.50 CPI to a system executing SPEC/p92 applica
tions. This paper presents the system performance impact of floating-p
oint division latency for varying instruction issue rates. It also exa
mines the performance implications of shared multiplication hardware,
shared square root, on-the-fly rounding and conversion, and fused func
tional units. Using a system level study as a basis, it is shown how t
ypical floating-point applications can guide the designer in making im
plementation decisions and trade-offs.