Virtually all proposals for querying XML include a class of query we term "
containment queries". It is also clear that in the foreseeable future, a su
bstantial amount of XML data will be stored in relational database systems.
This raises the question of how to support these containment queries. The
inverted list technology that underlies much of Information Retrieval is we
ll-suited to these queries, but should we implement this technology (a) in
a separate loosely-coupled IR engine, or (b) using the native tables and qu
ery execution machinery of the RDBMS? With option (b), more than twenty yea
rs of work on RDBMS query optimization, query execution, scalability, and c
oncurrency control and recovery immediately extend to the queries and struc
tures that implement these new operations. But all this will be irrelevant
if the performance of option (b) lags that of (a) by too much. In this pape
r, we explore some performance implications of both options using native im
plementations in two commercial relational database systems and in a specia
l purpose inverted list engine. Our performance study shows that while RDBM
Ss are generally poorly suited for such queries, under certain conditions t
hey can outperform an inverted list engine. Our analysis further identifies
two significant causes that differentiate the performance of the IR and RD
BMS implementations: the join algorithms employed and the hardware cache ut
ilization. Our results suggest that contrary to most expectations, with som
e modifications, a native implementation in an RDBMS can support this class
of query much more efficiently