Jg. Blom et Jg. Verwer, VECTORIZING MATRIX OPERATIONS ARISING FROM PDE DISCRETIZATION ON 9-POINT STENCILS, Journal of supercomputing, 8(1), 1994, pp. 29-51
When solving a system of PDEs, discretized on 9-point stencils over a
nonrectangular domain, the linear systems that arise will have matrice
s with an irregular block structure. In this paper we discuss the vect
orization of the matrix-vector multiply and of the Incomplete LU facto
rization and backsolve for these types of matrices. The performance of
the matrix-vector multiply is already optimal for a small number of g
rid points (one result per clock cycle). For the ILU factorization and
backsolve the vector performance is not as satisfying, partly because
the resulting vector length is generally small and partly because of
the heavy use of indirect addressing. A comparison with the general-pu
rpose routines from the SLAP library shows a significant gain in compu
tational time.