Large applications are typically partitioned into separately compiled
modules. Large performance gains in these applications are available b
y optimizing across module boundaries. One barrier to applying cross-m
odule optimization (CMO) to large applications is the potentially enor
mous amount of time and space consumed by the optimization process. We
describe a framework for scalable CMO that provides large gains in pe
rformance on applications that contain millions of lines of code. Two
major techniques are described. First, careful management of in-memory
data structures results in sub-linear memory occupancy when compared
to the number of lines of code being optimized. Second, profile data i
s used to focus optimization effort on the performance-critical portio
ns of applications. We also present practical issues that arise in dep
loying this framework in a production environment. These issues includ
e debuggability and compatibility with existing development tools, suc
h as make. Our framework is deployed in Hewlett-Packard's (HP) UNIX co
mpiler products and speeds up shipped independent software vendors' ap
plications by as much as 71%.