The large amount of data collected today is quickly overwhelming researcher
s' abilities to interpret the data and discover interesting patterns. Knowl
edge discovery and data mining systems contain the potential to automate th
e interpretation process, but these approaches frequently utilize computati
onally expensive algorithms. In particular, scientific discovery systems fo
cus on the utilization of richer data representation, sometimes without reg
ard for scalability. This research investigates approaches for scaling a pa
rticular knowledge discovery-data mining system, SUBDUE, using parallel and
distributed resources. SUBDUE has been used to discover interesting and re
petitive concepts in graph-based databases from a variety of domains, but r
equires a substantial amount of processing time. Experiments that demonstra
te scalability of parallel versions of the SUBDUE system are performed usin
g CAD circuit databases, satellite images, and artificially-generated datab
ases, and potential achievements and obstacles are discussed. (C) 2001 Acad
emic Press.