We present deterministic sorting and routing algorithms for grids and tori
with additional diagonal connections. For large loads (h greater than or eq
ual to 12), where each processor has at most h data packets in the beginnin
g and in the end, the sorting problem can be solved in optimal hn/6 + o(n)
and hn/12 + o(n) steps for grids and tori with diagonals, respectively. For
smaller loads, we present a new concentration technique that yields very f
ast algorithms for h < 12. For a load of 1, the previously most studied cas
e, sorting only takes 1.2n + o(n) steps and routing only 1.1n + o(n) steps.
For tori, we can present optimal algorithms for all loads h greater than o
r equal to 1. The above algorithms all use a constant-size memory for all p
rocessors and never copy or split packets, a property that the correspondin
g lower bounds make use of.
If packets may be copied, 1-1 sorting can be done in only 2n/3 + o(n) on a
torus with diagonals.
Generally gaining a speedup of 3 by only doubling the number of communicati
on links compared with a grid without diagonals, our work suggests building
grids and tori with diagonals.