DATA PARTITIONING FOR MULTICOMPUTER DATABASE-SYSTEMS - A CELL-BASED APPROACH

Citation
Ka. Hua et al., DATA PARTITIONING FOR MULTICOMPUTER DATABASE-SYSTEMS - A CELL-BASED APPROACH, Information systems, 18(5), 1993, pp. 329-342
Citations number
16
Categorie Soggetti
System Science","Information Science & Library Science","Computer Applications & Cybernetics
Journal title
ISSN journal
03064379
Volume
18
Issue
5
Year of publication
1993
Pages
329 - 342
Database
ISI
SICI code
0306-4379(1993)18:5<329:DPFMD->2.0.ZU;2-U
Abstract
This paper deals with load balancing in distributed memory parallel da tabase computers. In such an environment, a relation is typically part itioned and distributed across a set of processing nodes. An efficient load balancing strategy, therefore, is the determining factor for the performance improvement of such systems. We introduce the notion of c ell as the unit of data partition and data movement, and devise a gree dy algorithm for the initial data distribution and an iterative method for subsequent rebalancing. A directory is used to map a cell to a pr ocessing node, thus, two logically consecutive cells can be assigned i ndependently. We also develop an analytical model to compare our schem e to two known methods. The results indicate that the proposed techniq ue exhibits significant rebalancing costs reduction (in terms of respo nse time and throughput) over existing approaches. We show that, for a given set of system parameters, the maximum possible tuple distributi on imbalance that the initial data distribution algorithm and the subs equent data rebalancing algorithm may cause. We then attain a mechanis m to determine the appropriate number of cells that considers both the processing overhead and the maximum possible data partition imbalance .