Design and evaluation of a switch cache architecture for CC-NUMA multiprocessors

Citation
Rr. Iyer et Ln. Bhuyan, Design and evaluation of a switch cache architecture for CC-NUMA multiprocessors, IEEE COMPUT, 49(8), 2000, pp. 779-797
Citations number
29
Categorie Soggetti
Computer Science & Engineering
Journal title
IEEE TRANSACTIONS ON COMPUTERS
ISSN journal
00189340 → ACNP
Volume
49
Issue
8
Year of publication
2000
Pages
779 - 797
Database
ISI
SICI code
0018-9340(200008)49:8<779:DAEOAS>2.0.ZU;2-G
Abstract
Cache coherent nonuniform memory access (CC-NUMA) multiprocessors provide a scalable design for shared memory. But, they continue to suffer from large remote memory access latencies due to comparatively slow memory technology and large data transfer latencies in the interconnection network. In this paper, we propose a novel hardware caching technique, called switch cache, to improve the remote memory access performance of CC-NUMA multiprocessors. The main idea is to implement small fast caches in crossbar switches of th e interconnect medium to capture and store shared data as they flow from th e memory module to the requesting processor. This stored data acts as a cac he for subsequent requests, thus reducing the need for remote memory access es tremendously. The implementation of a cache in a crossbar switch needs t o be efficient and robust, yet flexible for changes in the caching protocol . The design and implementation details of a CAche Embedded Switch ARchitec ture, CAESAR, using wormhole routing with Virtual channels is presented. We explore the design space of switch caches by modeling CAESAR in a detailed execution driven simulator and analyze the performance benefits. Our resul ts show that the CAESAR switch cache is capable of improving the performanc e of CC-NUMA multiprocessors by up to 45 percent reduction in remote memory accesses for some applications. By serving remote read requests at various stages in the interconnect, we observe improvements in execution time as h igh as 20 percent for these applications. We conclude that switch caches pr ovide a cost-effective solution for designing high performance CC-NUMA mult iprocessors.