On supporting containment queries in relational database management systems

Citation
C. Zhang et al., On supporting containment queries in relational database management systems, SIG RECORD, 30(2), 2001, pp. 425-436
Citations number
40
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
425 - 436
Database
ISI
SICI code
0163-5808(200106)30:2<425:OSCQIR>2.0.ZU;2-3
Abstract
Virtually all proposals for querying XML include a class of query we term " containment queries". It is also clear that in the foreseeable future, a su bstantial amount of XML data will be stored in relational database systems. This raises the question of how to support these containment queries. The inverted list technology that underlies much of Information Retrieval is we ll-suited to these queries, but should we implement this technology (a) in a separate loosely-coupled IR engine, or (b) using the native tables and qu ery execution machinery of the RDBMS? With option (b), more than twenty yea rs of work on RDBMS query optimization, query execution, scalability, and c oncurrency control and recovery immediately extend to the queries and struc tures that implement these new operations. But all this will be irrelevant if the performance of option (b) lags that of (a) by too much. In this pape r, we explore some performance implications of both options using native im plementations in two commercial relational database systems and in a specia l purpose inverted list engine. Our performance study shows that while RDBM Ss are generally poorly suited for such queries, under certain conditions t hey can outperform an inverted list engine. Our analysis further identifies two significant causes that differentiate the performance of the IR and RD BMS implementations: the join algorithms employed and the hardware cache ut ilization. Our results suggest that contrary to most expectations, with som e modifications, a native implementation in an RDBMS can support this class of query much more efficiently