ITA
ENG

Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions

Authors

Aberdeen, D Baxter, J

Citation

D. Aberdeen et J. Baxter, Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions, CONCURR COM, 13(2), 2001, pp. 103-119

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

ISSN journal

15320626 → ACNP

Volume

Issue

Year of publication

2001

Pages

103 - 119

Database

ISI

SICI code

1532-0626(200102)13:2<103:EAFMMU>2.0.ZU;2-L

Abstract

Generalized matrix-matrix multiplication forms the kernel of many mathemati cal algorithms, hence a faster matrix-matrix multiply immediately benefits these algorithms, In this paper we implement efficient matrix multiplicatio n for large matrices using the Intel Pentium single instruction multiple da ta (SIMD) floating point architecture. The main difficulty with the Pentium and other commodity processors is the need to efficiently utilize the cach e hierarchy, particularly given the growing gap between main-memory and CPU clock speeds. We give a detailed description of the register allocation, L evel 1 and Level 2 cache blocking strategies that yield the best performanc e for the Pentium III family. Our results demonstrate an average performanc e of 2.09 times faster than the leading public domain matrix-matrix multipl y routines and comparable performance with Intel's SIMD small matrix-matrix multiply routines. Copyright (C) 2001 John Wiley & Sons, Ltd.