The domain decomposition method (DDM) is an efficient algorithmic tool for
the parallelization of finite element computer codes. A variant of the DDM
with direct solution algorithm is based on computation of Schur complement
matrices for finite element partitions. This paper describes a simple techn
ique that considerably improves execution rate of computationally intensive
routines of the Schur complement computations, The technique uses 'block o
f columns' matrix operations and loop unrolling to reduce load instructions
from cache memory and to increase instruction-level parallelism. For super
scalar RISC processors, experimental results show that it is possible to im
prove performance of the DDM solution procedure by several times. (C) 2000
Civil-Comp Ltd. and Elsevier Science Ltd. All rights reserved.