Several researchers have proposed algorithms for basic block reorderin
g. We call these branch alignment algorithms. The primary emphasis of
these algorithms has been on improving instruction cache locality, and
the few studies concerned with branch prediction reported small or mi
nimal improvements. As wide-issue architectures become increasingly po
pular the importance of reducing branch costs will increase, and branc
h alignment is one mechanism which can effectively reduce these costs.
In this paper, we propose an improved branch alignment algorithm that
takes into consideration the architectural cost model and the branch
prediction architecture when performing the basic block reordering. We
show that branch alignment algorithms can improve a broad range of st
atic and dynamic branch prediction architectures. We also show that a
programs performance can be improved by approximately 5% even when usi
ng recently proposed, highly accurate branch prediction architectures.
The programs are compiled by any existing compiler ad then transforme
d via binary transformations. When implementing these algorithms on a
Alpha AXP 21604 up to a 16% reduction in total execution time is achie
ved.