Interactions among two or more components may consist of passing of (large)
data, a continuation, or a remote method call. Optimizing or minimizing th
e cost of such interactions is a fundamental problem in generating efficien
t code for component-oriented programs. This paper lays out a foundation fo
r modeling and optimizing the interaction between components, with the goal
of optimizing the entire assembled system. Our approach consists of automa
tically choosing, among a number of potential candidate, implementation str
ategies for choice of data structures, communication mechanisms, data place
ment, etc., components may use during their life time.
By building a high-level optimizer that has dynamic knowledge of how compon
ents will be used in a system, we may achieve performance improvements of a
n order of magnitude or more. In our approach, a component writer provides
the alternative implementation strategies, and our analysis will recommend
or use the best strategies among the alternative during execution. One way
to specify different alternatives is to recognize the characterization of p
roblem domain. A more efficient code may be generated if we recognize the d
ynamic nature of the component usage. For instance, at one point during exe
cution a component may choose one alternative strategy that is different fr
om the choice it makes at another program point. In general, information ne
eded to decide when to use one implementation over another may not be avail
able to a component writer, but it is available when components are put to
use in a system (for instance, the knowledge about objects are allocated, s
ay for locality analysis, is only available at runtime).
Sometimes choosing among the alternative implementations can especially tri
cky because the different choices are not independent of each other. This m
eans an implementation choice that works well for one component may hurt th
e performance of another. This problem is particularly troublesome for comp
onent writers, since they have an extremely clear view of one component but
cannot see the rest. For example, programs which work on the net will gene
rally do better with ASCII strings, whereas programs that work heavily with
databases might do better with EBCDIC -- and programs which do both (or sy
stems in which the two kinds of program coexist) have to choose between the
m, or pay the cost of translation. Similarly, distributed objects that inte
ract extensively should be on the same machine, although it does not matter
on which machine they reside. Many of these optimizations can be reduced t
o a graph-cutting problem in which nodes in the graph correspond to objects
and edges and its weights correspond to their interact. To optimize the in
teraction, we need to find a min-cut on the graph, and in general the min-c
ut problem is NP hard.
We give heuristics that often give an almost linear time performance on act
ual instances of the problem. We present a vision for how future high-level
optimizations that can optimize programs as they are written today, elimin
ating some of the inefficiencies caused by the high level component based p
rogramming. In the full paper we present theoretical analysis and case stud
ies that show the feasibility of this approach, however much research remai
ns to be done to achieve the promise.