The next-generation astronomy digital archives will cover most of the sky a
t fine resolution in many wavelengths, from X-rays, through ultraviolet, op
tical, and infrared. The archives will be stored at diverse geographical lo
cations. One of the first of these projects, the Sloan Digital Sky Survey (
SDSS) is creating a 5-wavelength catalog over 10,000 square degrees of the
sky (see http://www.sdss.org/). The 200 million objects in the multi-teraby
te database will have mostly numerical attributes in a 100+ dimensional spa
ce. Points in this space have highly correlated distributions.
The archive will enable astronomers to explore the data interactively. Data
access will be aided by multidimensional spatial and attribute indices. Th
e data will be partitioned in many ways. Small tag objects consisting of th
e most popular attributes will accelerate frequent searches. Splitting the
data among multiple servers will allow parallel, scalable I/O and parallel
data analysis. Hashing techniques will allow efficient clustering, and pair
-wise comparison algorithms that should parallelize nicely. Randomly sample
d subsets will allow de bugging otherwise large queries at the desktop. Cen
tral servers will operate a data pump to support sweep searches touching mo
st of the data. The anticipated queries will require special operators rela
ted to angular distances and complex similarity tests of object properties,
like shapes, colors, velocity vectors, or temporal behaviors. These issues
pose interesting data management challenges.