In this paper, we present a parallel SIMD algorithm for radix sorting
of N numbers of w bits each, taking O(w + N-1/4) time with the VLSI ar
ea of O(N(3/2)w(2)), 0 < w < N-1/4. For w = log N, our algorithm impro
ves a previous known solution on a similar architecture in time comple
xity by a factor of log N. Since our algorithm uses only radix sort fo
r sorting of subsets and merging of them, no comparator is needed. Our
algorithm satisfies the lower bound of AT(2) complexity which mainly
restricts the VLSI implementation of most sorting algorithms. The same
result is obtained in another previously known solution, but it requi
res a comparator of size w.