This paper describes the design and implementation of a multi-threaded Dist
ributed Shared Memory (DSM) system, called Cohesion, which provides high pr
ogramming flexibility and latency masking, and supports load balancing. Coh
esion offers a parallel programming environment which is very similar to th
at on a multiprocessors system. Threads could be created recursively in thi
s environment, and users are not required to handle the locations of the th
reads. Instead of supporting a shared variable model, Cohesion provides a g
lobal shared address space among all nodes in the system. The space is furt
her divided into three regions, i.e., release, conventional, and object-bas
ed memory, each is applied with different consistency protocol. In this pap
er, the design issues in an ordinary thread system, such as thread manageme
nt, load balancing, and synchronization, have been reconsidered with the me
mory management provided by the DSM system. Several real applications have
been used to evaluate the performance of the system. The results show that
multi-threading usually has better performance than single-threading becaus
e the network latency can be masked by overlapping communication and comput
ation. However, the gain depends on program behavior and the number of thre
ads executed on each node in the system.