COLLECTIVE COMMUNICATION IN WORMHOLE-ROUTED MASSIVELY-PARALLEL COMPUTERS

Citation
Pk. Mckinley et al., COLLECTIVE COMMUNICATION IN WORMHOLE-ROUTED MASSIVELY-PARALLEL COMPUTERS, Computer, 28(12), 1995, pp. 39
Citations number
24
Categorie Soggetti
Computer Sciences","Computer Science Hardware & Architecture","Computer Science Software Graphycs Programming
Journal title
ISSN journal
00189162
Volume
28
Issue
12
Year of publication
1995
Database
ISI
SICI code
0018-9162(1995)28:12<39:CCIWMC>2.0.ZU;2-C
Abstract
The supercomputer market is now dominated by parallel architectures, a mong which massively parallel computers (MPCs) are an important class of systems. The memory of an MPC is physically distributed among an en semble of computing nodes that communicate by sending data through a n etwork. Communication operations can be either point-to-point, with on e source and one destination, or collective, with more than two partic ipating processes. The design of collective communication operations d epends on the MPC's underlying network architecture. While there has b een little consensus on some aspects of communication architectures, s uch as network topology, a good deal of agreement exists regarding the most efficient way to switch messages through the network. Most MPCs use wormhole routing, in which each message is divided into small piec es that are pipelined through the network. Compared with the store-and -forward switching method used in early multicomputers, wormhole routi ng reduces the effect of path length on communication time. However, i n situations where multiple messages exist in the network concurrently , wormhole routing can exacerbate channel contention, which occurs whe n blocked messages hold some communication channels while waiting for others. Invoking a collective operation, which can involve many messag es, poses this situation. In recent years, many projects have addresse d the design of efficient collective communication algorithms for worm hole-routed systems. By exploiting the relative distance-insensitivity of wormhole routing, these new algorithms often differ fundamentally from their store-and-forward counterparts. This article examines softw are and hardware approaches to implementing collective communication o perations, illustrating several issues arising in this research area a nd describing the major classes of algorithms proposed to solve these problems.