Single-assignment and functional languages have value semantics that d
o not permit side-effects. This lack of side-effects makes automatic d
etection of parallelism and optimization for data locality in programs
much easier. However, the same property poses a challenge in implemen
ting these languages efficiently. This paper describes an optimizing c
ompiler system that solves the key problem of aggregate copy eliminati
on. The methods developed rely exclusively on compile-time algorithms,
including interprocedural analysis, that are applied to an intermedia
te data flow representation. By dividing the problem into update-in-pl
ace and build-in-place analysis, a small set of relatively simple tech
niques edge substitution, graph pattern matching, substructure sharing
and substructure targeting-was found to be very powerful. If combined
properly and implemented carefully, the algorithms eliminate unnecess
ary copy operations to a very high degree. No run-time overhead is imp
osed on the compiled programs.