PERFORMANCE AND OPTIMIZATION OF DATA PREFETCHING STRATEGIES IN SCALABLE MULTIPROCESSORS

Citation
Rh. Saavedra et al., PERFORMANCE AND OPTIMIZATION OF DATA PREFETCHING STRATEGIES IN SCALABLE MULTIPROCESSORS, Journal of parallel and distributed computing, 22(3), 1994, pp. 427-448
Citations number
30
Categorie Soggetti
Computer Sciences","Computer Science Theory & Methods
ISSN journal
07437315
Volume
22
Issue
3
Year of publication
1994
Pages
427 - 448
Database
ISI
SICI code
0743-7315(1994)22:3<427:PAOODP>2.0.ZU;2-G
Abstract
Prefetching is one of several techniques for hiding and tolerating the large memory latencies of scalable multiprocessors. In this paper, we present a performance model for analyzing the limits and effectivenes s of data prefetching. The model incorporates the effects of program b ehavior, network characteristics, cache coherency protocols, and memor y consistency model. Our results indicate that, as long as there is en ough extra network bandwidth, prefetching is very effective in hiding large latencies. In machines with sufficiently large caches to hold th e program working set, the intra- and internode cache interference is marginally low enough to have any significant impact on prefetching pe rformance. Furthermore, we reveal the fact that the effective prefetch distance plays a vital role and adapts extremely well to changes in c ache miss rates and remote latencies, thus allowing prefetches to be m ore effective in hiding latency. An adaptive algorithm is provided to optimize the prefetch distance. This is based on the dynamic behavior of the application, interconnection network, and distributed caches an d memories. This optimization of the prefetch distance constitutes a s ignificant advantage of prefetching over other latency tolerating tech niques, such as multithreading. We show that the prefetch distance can be chosen constant, program-dependent, or decided by performance info rmation. The optimal distance could be adaptively determined using bot h compile-time and runtime conditions. Our results are therefore usefu l not only to compiler writers, but also for the development of runtim e support systems in multiprocessors. In large-scale systems, in which network traffic control predominates the performance, the ultimate go al is to match program behavior with machine behavior. (C) 1994 Academ ic Press, Inc.