This paper deals with load balancing in distributed memory parallel da
tabase computers. In such an environment, a relation is typically part
itioned and distributed across a set of processing nodes. An efficient
load balancing strategy, therefore, is the determining factor for the
performance improvement of such systems. We introduce the notion of c
ell as the unit of data partition and data movement, and devise a gree
dy algorithm for the initial data distribution and an iterative method
for subsequent rebalancing. A directory is used to map a cell to a pr
ocessing node, thus, two logically consecutive cells can be assigned i
ndependently. We also develop an analytical model to compare our schem
e to two known methods. The results indicate that the proposed techniq
ue exhibits significant rebalancing costs reduction (in terms of respo
nse time and throughput) over existing approaches. We show that, for a
given set of system parameters, the maximum possible tuple distributi
on imbalance that the initial data distribution algorithm and the subs
equent data rebalancing algorithm may cause. We then attain a mechanis
m to determine the appropriate number of cells that considers both the
processing overhead and the maximum possible data partition imbalance
.