High bandwidth and low latency switches are commercially available. Us
ing these switches, it becomes possible to build a system area network
to interconnect workstations and processor clusters together to provi
de a cost-effective parallel computing platform. A processor cluster m
ay be a shared-memory multiprocessor or a mesh-connected multicomputer
, etc. The interconnection topology on this kind of platform, called s
witch-based NOWP, is usually irregular. On such systems, multicast is
an important collective communication operation. Two steps are involve
d in a multicast: (1) the source node sends the multicast message to t
he destinations which are connected to a switch directly or are the le
ader of a processor cluster, and (2) the leader node of each cluster s
ends the message to other destinations in the same cluster. In this pa
per, we propose two unicast-based multicast algorithms. Algorithm Mult
icast_1 performs those two steps sequentially; while Algorithm Multica
st_2 overlaps them. Performance of the two algorithms will be evaluate
d and compared.