PROBVIEW - A FLEXIBLE PROBABILISTIC DATABASE SYSTEM

Citation
Lvs. Lakshmanan et al., PROBVIEW - A FLEXIBLE PROBABILISTIC DATABASE SYSTEM, ACM transactions on database systems, 22(3), 1997, pp. 419-469
Citations number
36
Categorie Soggetti
Computer Sciences","Computer Science Information Systems","Computer Science Software Graphycs Programming
ISSN journal
03625915
Volume
22
Issue
3
Year of publication
1997
Pages
419 - 469
Database
ISI
SICI code
0362-5915(1997)22:3<419:P-AFPD>2.0.ZU;2-8
Abstract
Probability theory is mathematically the best understood paradigm for modeling and manipulating uncertain information. Probabilities of comp lex events can be computed from those of basic events on which they de pend, using any of a number of strategies. Which strategy is appropria te depends very much on the known interdependencies among the events i nvolved. Previous work on probabilistic databases has assumed a fixed and restrictive combination strategy (e.g., assuming all events are pa irwise independent). In this article, we characterize, using postulate s, whole classes of strategies for conjunction, disjunction, and negat ion, meaningful from the viewpoint of probability theory. (1) We propo se a probabilistic relational data model and a generic probabilistic r elational algebra that neatly captures various strategies satisfying t he postulates, within a single unified framework. (2) We show that as long as the chosen strategies can be computed in polynomial time, quer ies in the positive fragment of the probabilistic relational algebra h ave essentially the same data complexity as classical relational algeb ra. (3) We establish various containments and equivalences between alg ebraic expressions, similar in spirit to those in classical algebra. ( 4) We develop algorithms for maintaining materialized probabilistic vi ews. (5) Based on these ideas, we have developed a prototype probabili stic database system called ProbView on top of Dbase V.0. We validate our complexity results with experiments and show that rewriting certai n types of queries to other equivalent forms often yields substantial savings.