Minimization of tree pattern queries

Citation
S. Amer-yahia et al., Minimization of tree pattern queries, SIG RECORD, 30(2), 2001, pp. 497-508
Citations number
19
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
497 - 508
Database
ISI
SICI code
0163-5808(200106)30:2<497:MOTPQ>2.0.ZU;2-C
Abstract
Tree patterns form a natural basis to query tree-structured data such as XM L and LDAP. Since the efficiency of tree pattern matching against a tree-st ructured database depends on the size of the pattern, it is essential to id entify and eliminate redundant nodes in the pattern and do so as quickly as possible. In this paper, we study tree pattern minimization both in the ab sence and in the presence of integrity constraints (ICs) on the underlying tree-structured database. When no ICs are considered, we call the process of minimizing a tree patter n, constraint-independent minimization. We develop a polynomial time algori thm called CIM for this purpose. CIM's efficiency stems from two key proper ties: (i) a node cannot be redundant unless its children are, and (ii) the order of elimination of redundant nodes is immaterial. When ICs are conside red for minimization, we refer to it as constraint-dependent minimization. For tree-structured databases, required child/descendant and type co-occurr ence ICs are very natural. Under such ICs, we show that the minimal equival ent query is unique. We show the surprising result that the algorithm obtai ned by first augmenting the tree pattern using ICs, and then applying CIM, always finds the unique minimal equivalent query; we refer to this algorith m as ACIM. While ACIM is also polynomial time, it can be expensive in pract ice because of its inherent non-locality. We then present a fast algorithm, CDM, that identifies and eliminates local redundancies due to ICs, based o n propagating "information labels" up the tree pattern. CDM can be applied prior to ACIM for improving the minimization efficiency. We complement our analytical results with an experimental study that shows the effectiveness of our tree pattern minimization techniques.