This paper proposes a Barrier Tree for Meshes (BTM) to minimize the barrier
synchronization latency for two-dimensional (2D) meshes. The proposed BTM
scheme has two distinguishing features. First, the synchronization tree is
4-ary. The synchronization latency of the BTM scheme is asymptotically Thet
a (log(4) n), while that of the fastest scheme reported in the literature i
s bounded between Omega (log(3) n) and O(n(1/2)), where n is the number of
member nodes. Second, nonmember nodes are neither involved in the construct
ion of a BTM nor actively participate in the synchronization operations, wh
ich avoids interference among different process groups during synchronizati
on. This not only results in low setup overhead, but also reduces the synch
ronization latency. The low setup overhead is particularly effective for th
e dynamic process model provided in MPI-2. Extensive simulation study shows
that, for up to 64 x 64 meshes, the BTM scheme results in about 40 similar
to 70 percent shorter synchronization latency and is more scalable than co
nventional schemes.