We consider optimal load balancing in a distributed computing environment c
onsisting of homogeneous unreliable processors. Each processor receives its
own sequence of tasks from outside users, some of which can be redirected
to the other processors. Processing times are independent and identically d
istributed with an arbitrary distribution. The arrival sequence of outside
tasks to each processor may be arbitrary as long as it is independent of th
e state of the system. Processors may fail, with arbitrary failure and repa
ir processes that are also independent of the state of the system. The only
information available to a processor is the history of its decisions for r
outing work to other processors, and the arrival times of its own arrival s
equence.
We prove the optimality of the round-robin policy, in which each processor
sends all the tasks that can be redirected to each of the other processors
in turn. We show that, among all policies that balance workload, round robi
n stochastically minimizes the nth task completion time for all n, and mini
mizes response times and queue lengths in a separable increasing convex sen
se for the entire system. We also show that if there is a single centralize
d controller, round-robin is the optimal policy, and a single controller us
ing round-robin routing is better than the optimal distributed system in wh
ich each processor routes its own arrivals. Again "optimal" and "better" ar
e in the sense of stochastically minimizing task completion times, and mini
mizing response time and queue lengths in the separable increasing convex s
ense.