We present a system that allows OpenMP programs to execute on a network of
workstations with a variable number of nodes. The ability to adapt to a var
iable number of nodes allows a program to take advantage of additional node
s that become available after it starts execution, or to gracefully scale d
own when the number of available nodes is reduced. We demonstrate that the
cost of adaptation,is modest; the system allows a program to adapt at a mod
erate rate without much performance loss.
Two ideas underlie the efficiency of our design. First, we recognize that O
penMP programs exhibit convenient adaptation points during their execution,
points at which the cost of adaptation can be much reduced. Second, by all
owing a process a certain grace period before it must leave a node, we insu
re that most adaptations can occur at these adaptation points, and thus at
low cost. Migration of a process, a much more expensive method for providin
g adaptivity, is used only as a back-up solution, when the process cannot r
each an adaptation point within the grace period.
Our implementation consists of an OpenMP pre-processor that generates Tread
Marks distributed shared memory (DSM) programs, and a version of TreadMarks
modified to adapt to a variable number of nodes. Using a DSM as the underl
ying substrate facilitates the data (re-)distribution necessary after an ad
aptation.