E. Dizitti et al., EFFICIENT TECHNIQUE FOR PARTITIONING AND PROGRAMMING LINEAR ALGEBRA ALGORITHMS ON CONCURRENT VLSI ARCHITECTURES, IEE proceedings. Circuits, devices and systems, 142(2), 1995, pp. 97-104
An efficient technique for partitioning and programming linear algebra
algorithms on concurrent architectures is described and applied to 2-
D wavefront arrays. The mapping of the computational elements (process
es) to processors is based on the concept of folding. The mapping patt
ern on the 2-D full-size mesh of processes is a composition of symmetr
ic tiles of size 2 root(N) x 2 root(N), N being the number of processo
rs. The algorithm can be partitioned according to a globally sequentia
l, locally parallel scheme, The code optimisation is performed by prog
ramming a few different types of tile, according to the algorithm. Whe
n the size of the problem is much larger than the size of the mesh of
processors, a linear speed-up is achieved independently of the number
of processors. Experimental results are presented for matrix multiplic
ation, LU decomposition and the solution of triangular system equation
s on 2-D meshes of transputers programmed in Occam.