Multicast communication is a key issue in almost all applications that run
on any parallel architecture and, hence, efficient implementation of of mul
ticast is critical to the performance of multiprocessor machines. Multicast
is implemented in parallel architectures either via software or via hardwa
re. Software-based approaches for implementing multicast can result in high
message latencies, while hardware-based schemes can greatly improve perfor
mance. Deadlock freedom in multicast communication is much more difficult t
o achieve resulting in more involved routing algorithms and higher startup
delays. Hardware tree-based algorithms do not require these high startup de
lays, but do suffer from high probabilities of message blocking leading to
poor performance. In this paper, we propose a new hardware tree-based routi
ng algorithm (HTA) for multicast communication under virtual cut-through sw
itching in k-ary n-cubes that outperforms existing software and hardware pa
th-based multicast routing schemes. Simulation results are compared against
several commonly used multicast routing algorithms and show that HTA perfo
rms extremely well under many different conditions.