Accelerating shared virtual memory via general-purpose network interface support

Citation
A. Bilas et al., Accelerating shared virtual memory via general-purpose network interface support, ACM T COMP, 19(1), 2001, pp. 1-35
Citations number
53
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON COMPUTER SYSTEMS
ISSN journal
07342071 → ACNP
Volume
19
Issue
1
Year of publication
2001
Pages
1 - 35
Database
ISI
SICI code
0734-2071(200102)19:1<1:ASVMVG>2.0.ZU;2-2
Abstract
Clusters of symmetric multiprocessors (SMPs) are important platforms for hi gh-performance computing. With the success of hardware cache-coherent distr ibuted shared memory (DSM), a lot of effort has also been made to support t he coherent shared-address-space programming model in software on clusters. Much research has been done in fast communication on clusters and in proto cols for supporting software shared memory across them. However, the perfor mance of software virtual memory (SVM) is still far from that achieved on h ardware DSM systems. The goal of this paper is to improve the performance o f SVM on system area network clusters by considering communication and prot ocol layer interactions. We first examine what are the important communicat ion system bottlenecks that stand in the way of improving parallel performa nce of SVM clusters; in particular, which parameters of the communication a rchitecture are most important to improve further relative to processor spe ed, which ones are already adequate on modern systems for most applications , and how will this change with technology in the future. We find that the most important communication subsystem cost to improve is the overhead of g enerating and delivering interrupts for asynchronous protocol processing. T hen we proceed to show, that by providing simple and general support for as ynchronous message handling in a commodity network interface (NI) and by al tering SVM protocols appropriately, protocol activity can be decoupled from asynchronous message handling, and the need for interrupts or polling can be eliminated. The NI mechanisms needed are generic, not SVM-dependent. We prototype the mechanisms and such a synchronous home-based LRC protocol, ca lled GeNIMA (GEneral-purpose Network Interface support for shared Memory Ab stractions), on a cluster of SMPs with a programmable NI. We find that the performance improvements are substantial, bringing performance on a small-s cale SMP cluster much closer to that of hardware-coherent shared memory for many applications, and we show the value of each of the mechanisms in diff erent applications.