Cih. Chen et V. Cherkassky, TASK ALLOCATION AND REALLOCATION FOR FAULT-TOLERANCE IN MULTICOMPUTERSYSTEMS, IEEE transactions on aerospace and electronic systems, 30(4), 1994, pp. 1094-1104
The goal of task allocation in a set of interconnected processors (com
puters) is to maximize the efficient use of resources and thus reduce
the job turnaround time. Proposed here a simple yet effective method t
o allocate the tasks in multicomputer systems for minimizing the inter
processor communication cost subject to resource limitations defined b
y the system and designer. The limitations can be viewed as results fr
om the load balancing since the execution time of each task, the numbe
r of available processors, processor speed, and memory capacity are kn
own to the system or designer. As the number of processors increases,
the probability of a failure existing somewhere in the systems at any
time also increases. Very few established task allocation models have
considered the reliability property. In multicomputer systems, we defi
ne system reliability as the probability that the system can run the t
asks successfully. After the (nonredundant) task scheduling strategy i
s defined, tasks are then reallocated to processors statically and red
undantly. This is a form of time redundancy, in which if some processo
rs fail during the execution, all tasks can be completed on the remain
ing processors (but at a longer time). Due to static preallocation of
tasks this method is simpler and thus more practical than well-known d
ynamic reconfiguration and rollback recovery techniques in multicomput
er systems. We demonstrate the effectiveness of the task allocation an
d reallocation for hardware fault tolerance by illustrations of applyi
ng the methods to different examples and practical communications netw
ork multiprocessor systems.