Extensible parallel query processing for exploratory geoscientific data mining

Citation
Ec. Shek et al., Extensible parallel query processing for exploratory geoscientific data mining, DATA M K D, 5(4), 2001, pp. 277-304
Citations number
25
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
DATA MINING AND KNOWLEDGE DISCOVERY
ISSN journal
13845810 → ACNP
Volume
5
Issue
4
Year of publication
2001
Pages
277 - 304
Database
ISI
SICI code
1384-5810(2001)5:4<277:EPQPFE>2.0.ZU;2-6
Abstract
Exploratory data mining and analysis requires a computing environment which provides facilities for the user-friendly expression and rapid execution o f "scientific queries." In this paper, we address research issues in the pa rallelization of scientific queries containing complex user-defined operati ons. In a parallel query execution environment, parallelizing a query execu tion plan involves determining how input data streams to evaluators impleme nting logical operations can be divided to be processed by clones of the sa me evaluator in parallel. We introduced the concept of "relevance window" t hat characterizes data lineage and data partitioning opportunities availabl e for an user-defined evaluator. In addition, we developed a query parallel ization framework by extending relational parallel query optimization algor ithms to allow the parallelization characteristics of user-defined evaluato rs to guide the process of query parallelization in an extensible query pro cessing environment. We demonstrated the utility of our system by performin g experiments mining cyclonic activity, blocking events, and the upward wav e-energy propagation features from several observational and model simulati on datasets.