Branch alignment reorders the basic blocks of a program to minimize pi
peline penalties due to control-transfer instructions. Prior work in b
ranch alignment has produced useful heuristic methods. We present a br
anch alignment algorithm that usually achieves the minimum possible pi
peline penalty and on our benchmarks averages within 0.3% of a provabl
e optimum. We compare the control penalties and running times of our a
lgorithm to an older, greedy approach and observe that both the greedy
method and our method are close to the lower bound on control penalti
es, suggesting that greedy is good enough. Surprisingly, in actual exe
cution our method produces programs that run noticeably faster than th
e greedy method. We also report results from training and testing on d
ifferent data sets, validating that our results can be achieved in rea
l-world usage. Training and testing on different data sets slightly re
duced the benefits from both branch alignment algorithms, but the rank
ing of the algorithms does not change, and the bulk of the benefits re
main.