To scale up to high-end configurations, shared-memory multiprocessors are e
volving towards Non Uniform Memory Access (NUMA) architectures. In this pap
er, we address the central problem of load balancing during parallel query
execution in NUMA multiprocessors. We first show that an execution model fo
r NUMA should not use data partitioning (as shared-nothing systems do) but
should strive to exploit efficient shared-memory strategies like Synchronou
s Pipelining (SP). However, SP has problems in NUMA, especially with skewed
data. Thus, we propose a new execution strategy which solves these problem
s. The basic idea is to allow partial materialization of intermediate resul
ts and to make them progressivly public, i.e., able to be processed by any
processor, as needed to avoid processor idle times. Hence, we call this str
ategy Progressive Sharing (PS). We conducted a performance comparison using
an implementation of SP and PS on a 72-processor KSR1 computer, with many
queries and large relations. With no skew, SP and PS have both linear speed
-up. However, the impact of skew is very severe on SP performance while it
is insignificant on PS. Finally, we show that, in NUMA, PS can also be bene
ficial in executing several pipeline chains concurrently.