Orca is a portable, object-based distributed shared memory (DSM) syste
m. This article studies and evaluates the design choices made in the O
rca system and compares Orca with other DSMs. The article gives a quan
titative analysis of Orca's coherence protocol (based on write-updates
with function shipping), the totally ordered group communication prot
ocol, the strategy for object placement, and the all-software, user-sp
ace architecture. Performance measurements for 10 parallel application
s illustrate the trade-offs made in the design of Orca and show that e
ssentially the right design decisions have been made. A write-update p
rotocol with function shipping is effective for Orca, especially since
it is used in combination with techniques that avoid replicating obje
cts that have a low read/write ratio. The overhead of totally ordered
group communication on application performance is low. The Orca system
is able to make near-optimal decisions for object placement and repli
cation. In addition, the article compares the performance of Orca with
that of a page-based DSM (TreadMarks) and another object-based DSM (C
RL). It also analyzes the communication overhead of the DSMs for sever
al applications. All performance measurements are done on a 32-node Pe
ntium Pro cluster with Myrinet and Fast Ethernet networks. The results
show that the Orca programs send fewer messages and less data than th
e TreadMarks and CRL programs and obtain better speedups.