IMPROVING THE MEMORY-SYSTEM PERFORMANCE OF SPARSE-MATRIX VECTOR MULTIPLICATION

Authors
Citation
S. Toledo, IMPROVING THE MEMORY-SYSTEM PERFORMANCE OF SPARSE-MATRIX VECTOR MULTIPLICATION, IBM journal of research and development, 41(6), 1997, pp. 711-725
Citations number
22
Categorie Soggetti
Computer Science Hardware & Architecture","Computer Science Hardware & Architecture","Multidisciplinary Sciences
ISSN journal
00188646
Volume
41
Issue
6
Year of publication
1997
Pages
711 - 725
Database
ISI
SICI code
0018-8646(1997)41:6<711:ITMPOS>2.0.ZU;2-1
Abstract
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describe s techniques that increase instruction-level parallelism and improve p erformance. The techniques include reordering to reduce cache misses ( originally due to Das et al.), blocking to reduce load instructions, a nd prefetching to prevent multiple load-store units from starring simu ltaneously. The techniques improve performance from about 40 MFLOPS (o n a well-ordered matrix) to more than 100 MFLOPS on a 266-MFLOPS machi ne. The techniques are applicable to other superscalar RISC processors as well, and have improved performance on a Sun UltraSPARC(TM) I work station, for example.