We introduce two new asynchronous PRAM models which allow significant
latencies for accessing global memory. In both models, accessing globa
l memory takes L time units where L is a fixed parameter, but the mode
ls provide two different mechanisms to help hide this latency. The Del
ay PRAM (D-PRAM) allows reads and writes to be issued before earlier r
eads and writes are completed; the Block PRAM (B-PRAM) allows a block
of L contiguous locations in global memory to be read or written in O(
L) time units. For both models we develop work-optimal randomized algo
rithms that solve a Certified Write-All Problem (CWA) of size n with e
xpected O(n) work using up to (n log L)I(L log n) processors. This is
a fundamental problem since it can be used as a synchronization primit
ive for n parallel instructions. If the D-PRAM has some restrictions o
n the asynchrony allowed we can use our CWA solution to simulate any n
-processor CRCW PRAM program on a restricted D-PRAM with memory latenc
y L, using O(n) expected work per parallel step, and using up to (n lo
g L)I(L log n) D-PRAM processors. We prove a matching lower bound whic
h shows that our CWA solution is optimal in terms of expected work. Ou
r algorithms work both for models where the latency L to access global
memory is fixed, and for models where the latency can vary probabilis
tically. (C) 1994 Academic Press, Inc.