EFFICIENTLY EXTENDIBLE MAPPINGS FOR BALANCED DATA DISTRIBUTION

Citation
Dm. Choy et al., EFFICIENTLY EXTENDIBLE MAPPINGS FOR BALANCED DATA DISTRIBUTION, Algorithmica, 16(2), 1996, pp. 215-232
Citations number
7
Categorie Soggetti
Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
16
Issue
2
Year of publication
1996
Pages
215 - 232
Database
ISI
SICI code
0178-4617(1996)16:2<215:EEMFBD>2.0.ZU;2-D
Abstract
In data storage applications, a large collection of consecutively numb ered data ''buckets'' are often mapped to a relatively small collectio n of consecutively numbered storage ''bins.'' For example, in parallel database applications, buckets correspond to hash buckets of data and bins correspond to database nodes. In disk array applications, bucket s correspond to logical tracks and bins correspond to physical disks i n an army Measures of the ''goodness'' of a mapping method include: (1 ) The time (number of operations) needed to compute the mapping. (2) T he storage needed to store a representation of the mapping. (3) The ba lance of the mapping, i.e., the extent to which all bins receive the s ame number of buckets. (4) The cost of relocation, that is, the number of buckets that must be relocated to a new bin if a new mapping is ne eded due to an expansion of the number of bins or the number of bucket s. One contribution of this paper is to give a new mapping method, the Interval-Round-Robin (IRR) method. The IRR method has optimal balance and relocation cost, and its time complexity and storage requirements compare favorably with known methods. Specifically, if m is the numbe r of times that the number of bins and/or buckets has increased, then the time complexity is O(log m) and the storage is O(m(2)). Another co ntribution of the paper is to identify the concept of a history-indepe ndent mapping, meaning informally that the mapping does not ''remember '' the past history of expansions to the number of buckets and bins, b ut only the current number of buckets and bins. Thus, such mappings re quire very little information to be stored. Assuming that balance and relocation are optimal, we prove that history-independent mappings are possible if the number of buckets is fixed (so only the number of bin s can increase), but not possible if the number df bins and buckets ca n both increase.