TASK ALLOCATION AND REALLOCATION FOR FAULT-TOLERANCE IN MULTICOMPUTERSYSTEMS

Citation
Cih. Chen et V. Cherkassky, TASK ALLOCATION AND REALLOCATION FOR FAULT-TOLERANCE IN MULTICOMPUTERSYSTEMS, IEEE transactions on aerospace and electronic systems, 30(4), 1994, pp. 1094-1104
Citations number
27
Categorie Soggetti
Telecommunications,"Engineering, Eletrical & Electronic","Aerospace Engineering & Tecnology
ISSN journal
00189251
Volume
30
Issue
4
Year of publication
1994
Pages
1094 - 1104
Database
ISI
SICI code
0018-9251(1994)30:4<1094:TAARFF>2.0.ZU;2-N
Abstract
The goal of task allocation in a set of interconnected processors (com puters) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed here a simple yet effective method t o allocate the tasks in multicomputer systems for minimizing the inter processor communication cost subject to resource limitations defined b y the system and designer. The limitations can be viewed as results fr om the load balancing since the execution time of each task, the numbe r of available processors, processor speed, and memory capacity are kn own to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we defi ne system reliability as the probability that the system can run the t asks successfully. After the (nonredundant) task scheduling strategy i s defined, tasks are then reallocated to processors statically and red undantly. This is a form of time redundancy, in which if some processo rs fail during the execution, all tasks can be completed on the remain ing processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known d ynamic reconfiguration and rollback recovery techniques in multicomput er systems. We demonstrate the effectiveness of the task allocation an d reallocation for hardware fault tolerance by illustrations of applyi ng the methods to different examples and practical communications netw ork multiprocessor systems.