In this paper, we address the issue of implementing matrix multiplication o
n heterogeneous platforms. We target two different classes of heterogeneous
computing resources: heterogeneous networks of workstations and collection
s of heterogeneous clusters. Intuitively, the problem is to load balance th
e work with different speed resources while minimizing the communication vo
lume. We formally state this problem in a geometric framework and prove its
NP-completeness. Next, we introduce a (polynomial) column-based heuristic,
which turns out to be very satisfactory: We derive a theoretical performan
ce guarantee for the heuristic and we assess its practical usefulness throu
gh MPI experiments.