In this paper, the authors deal with algorithmic issues on heterogeneous pl
atforms. They concentrate on dense linear algebra kernels, such as matrix m
ultiplication or LU decomposition. Block-cyclic distribution techniques use
d in ScaLAPACK are no longer sufficient to balance the load among processor
s running at different speeds. The main result of this paper is to provide
a static data distribution scheme that leads to an asymptotically perfect l
oad balancing for LU decomposition, thereby providing solid foundations tow
ard the design of a cluster-oriented version of ScaLAPACK.