Processors execute the full dynamic instruction stream to arrive at the fin
al output of a program, yet there exist shorter instruction streams that pr
oduce the same overall effect. We propose creating a shorter but otherwise
equivalent version of the original program by removing ineffectual computat
ion and computation related to highly-predictable control flow. The shorten
ed program is run concurrently with the full program on a chip multiprocess
or or simultaneous multithreaded processor, with two key advantages:
1) Improved single-program performance. The shorter program speculatively r
uns ahead of the full program and supplies the full program with control an
d data flow outcomes. The full program executes efficiently due to the comm
unicated outcomes, at the same time validating the speculative, shorter pro
gram. The two programs combined run faster than the original program alone.
Detailed simulations of an example implementation show an average improvem
ent of 7% for the SPEC95 integer benchmarks.
2) Fault tolerance. The shorter program is a subset of the full program and
this partial-redundancy is transparently leveraged for detecting and recov
ering from transient hardware faults.