This article is devoted to the run-time redistribution of one-dimensio
nal arrays that are distributed in a block-cyclic fashion over a proce
ssor grid. While previous studies have concentrated on efficiently gen
erating the communication messages to be exchanged by the processors i
nvolved in the redistribution, we focus on the scheduling of those mes
sages: how to organize the message exchanges into ''structured'' commu
nication steps that minimize contention. We build upon results of Walk
er and Otto, who solved a particular instance of the problem, and we d
erive an optimal scheduling for the most general case, namely, moving
from a CYCLIC (r) distribution on a P-processor grid to a CYCLIC (s) d
istribution on a Q-processor grid, for arbitrary values of the redistr
ibution parameters P, Q, r, and s.