A SURVEY OF PARALLEL EXECUTION STRATEGIES FOR TRANSITIVE CLOSURE AND LOGIC PROGRAMS

Citation
F. Cacace et al., A SURVEY OF PARALLEL EXECUTION STRATEGIES FOR TRANSITIVE CLOSURE AND LOGIC PROGRAMS, DISTRIBUTED AND PARALLEL DATABASES, 1(4), 1993, pp. 337-382
Citations number
70
Categorie Soggetti
Computer Sciences, Special Topics","Computer Science Theory & Methods","Computer Science Information Systems
ISSN journal
09268782
Volume
1
Issue
4
Year of publication
1993
Pages
337 - 382
Database
ISI
SICI code
0926-8782(1993)1:4<337:ASOPES>2.0.ZU;2-G
Abstract
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architect ures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribut e data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive c losure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive q ueries are complex; at the same time, their regular structure is parti cularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of re cursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is se parated into two distinct subareas, one focused on the transitive clos ure of Relational Algebra expressions, the other one focused on optimi zation of more general Datalog queries. Though the subareas seem radic ally different because of the approach and formalism used, they have m any common features. This is not surprising, because most typical Data log queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog pr ograms. We then review sequential methods for evaluating transitive cl osure, distinguishing iterative and direct methods. We address the par allelization of these methods, by discussing various forms of parallel ization. Data fragmentation plays an important role in obtaining paral lel execution; we describe hash-based and semantic fragmentation. Fina lly, we consider Datalog queries, and present general methods for para llel rule execution; we recognize the similarities between these metho ds and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance o f methods.