A model is proposed that allocates tables of a relational database to
the sites of a distributed system in order that the total cost of exec
uting a given collection of join queries is minimized. This model is p
resented in the convenient form of an integer linear program. Each ind
ividual query specifies that several logically distinct data sets, or
relations, are to be amalgamated and presented to the particular user
that issued the request. Performing this task requires the utilization
of limited system resources; both processors, and the communications
facilities that interconnect them, may be used. An optimal strategy fo
r executing a single query is, therefore, defined to be one that minim
izes a weighted sum of the costs of computation, and those of informat
ion interchange, incurred during the computation. One particular model
, appearing in [1], conforms to this philosophy, and so forms the basi
s for further investigations. The total cost of executing an entire gr
oup of such queries depends upon the way in which the relevant informa
tion is allocated to the sites of the network. Several copies of any p
articular relation may be dispersed across the network; the replicatio
n of data increases its availability, and potentially decreases the co
sts of answering the given requests. However, only limited storage cap
acities are available, and increased replication commands greater over
heads in maintaining consistency. An optimization program is developed
to design a data allocation plan that achieves a minimal total cost f
or the execution of a given group of requests, while maintaining restr
aints on the levels of data replication considered permissible.