Monitoring XML data on the Web

Citation
B. Nguyen et al., Monitoring XML data on the Web, SIG RECORD, 30(2), 2001, pp. 437-448
Citations number
21
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
30
Issue
2
Year of publication
2001
Pages
437 - 448
Database
ISI
SICI code
0163-5808(200106)30:2<437:MXDOTW>2.0.ZU;2-1
Abstract
We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from X ML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following: a subscription language which specifies the monitoring of pages when fetche d, the periodical evaluation of continuous queries and the production of XM L reports. the description of the architecture of the system we implemented that makes it possible to monitor a flow of millions of pages per day with millions o f subscriptions on a single PC, and scales up by using more machines. a new algorithm for processing alerts that can be used in a wider context. We support monitoring at the page level (e.g., discovery, of a new page wit hin a certain semantic domain) as well as at the element level (e.g., inser tion of a new electronic product in a catalog). This work is part of the Xyleme system. Xyleme is developed on a cluster of PCs under Linux with Corba communications. The part of the system describe d in this paper has been implemented. We mention first experiments.