B. Hendrickson et Tg. Kolda, Partitioning rectangular and structurally unsymmetric sparse matrices for parallel processing, SIAM J SC C, 21(6), 2000, pp. 2048-2072
A common operation in scientific computing is the multiplication of a spars
e, rectangular, or structurally unsymmetric matrix and a vector. In many ap
plications the matrix-transpose-vector product is also required. This paper
addresses the efficient parallelization of these operations. We show that
the problem can be expressed in terms of partitioning bipartite graphs. We
then introduce several algorithms for this partitioning problem and compare
their performance on a set of test matrices.