Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey

Citation
As. Szalay et al., Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey, SIG RECORD, 29(2), 2000, pp. 451-462
Citations number
17
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
451 - 462
Database
ISI
SICI code
0163-5808(200006)29:2<451:DAMMAA>2.0.ZU;2-A
Abstract
The next-generation astronomy digital archives will cover most of the sky a t fine resolution in many wavelengths, from X-rays, through ultraviolet, op tical, and infrared. The archives will be stored at diverse geographical lo cations. One of the first of these projects, the Sloan Digital Sky Survey ( SDSS) is creating a 5-wavelength catalog over 10,000 square degrees of the sky (see http://www.sdss.org/). The 200 million objects in the multi-teraby te database will have mostly numerical attributes in a 100+ dimensional spa ce. Points in this space have highly correlated distributions. The archive will enable astronomers to explore the data interactively. Data access will be aided by multidimensional spatial and attribute indices. Th e data will be partitioned in many ways. Small tag objects consisting of th e most popular attributes will accelerate frequent searches. Splitting the data among multiple servers will allow parallel, scalable I/O and parallel data analysis. Hashing techniques will allow efficient clustering, and pair -wise comparison algorithms that should parallelize nicely. Randomly sample d subsets will allow de bugging otherwise large queries at the desktop. Cen tral servers will operate a data pump to support sweep searches touching mo st of the data. The anticipated queries will require special operators rela ted to angular distances and complex similarity tests of object properties, like shapes, colors, velocity vectors, or temporal behaviors. These issues pose interesting data management challenges.