ITA
ENG

LOAD SHARING IN HYPERCUBE-CONNECTED MULTICOMPUTERS IN THE PRESENCE OFNODE FAILURES

Authors

CHANG YC SHIN KG

Citation

Yc. Chang et Kg. Shin, LOAD SHARING IN HYPERCUBE-CONNECTED MULTICOMPUTERS IN THE PRESENCE OFNODE FAILURES, I.E.E.E. transactions on computers, 45(10), 1996, pp. 1203-1211

Citations number

Categorie Soggetti

Computer Sciences","Engineering, Eletrical & Electronic","Computer Science Hardware & Architecture

Journal title

I.E.E.E. transactions on computers → ACNP

ISSN journal

00189340

Volume

Issue

Year of publication

1996

Pages

1203 - 1211

Database

ISI

SICI code

0018-9340(1996)45:10<1203:LSIHMI>2.0.ZU;2-Z

Abstract

This paper addresses two important issues associated with load sharing (LS) in hypercube-connected multicomputers: 1) ordering fault-free no des as preferred receivers of ''overflow'' tasks for each overloaded n ode and 2) developing an LS mechanism to handle node failures. Nodes a re arranged into preferred lists of receivers of overflow tasks in suc h a way that each node will be selected as the kth preferred node of o ne and only one other node [1]. Such lists are proven to allow the ove rflow tasks to be evenly distributed throughout the entire system. How ever, the occurrence of node failures will destroy the original struct ure of a preferred list if the failed nodes are simply dropped from th e list, thus forcing some nodes to be selected as the kth preferred no de of more than one other node. We propose three algorithms to modify the preferred list such that its original features can be retained reg ardless of the number of faulty nodes in the system. It is shown that the number of adjustments or the communication overhead of these algor ithms is minimal. Using the modified preferred lists, we also proposed a simple mechanism to tolerate node failures. Each node is equipped w ith a backup queue which stores and updates the information on the tas ks arriving/completing at its most preferred node.