We consider the problem of matrix transpose on mesh-connected processo
r networks. On the theoretical side, we present the first optimal algo
rithm for matrix transpose on two-dimensional meshes. Then we consider
issues on implementations, show that the theoretical best bound canno
t be achieved and present an alternative approach that really improves
the practical performance. Finally, we introduce the concept of ortho
gonalizations, which are generalization of matrix transposes. We show
how to realize them efficient ly and present interesting applications
of this new technique.