Current World-Wide Web technologies concentrate on presenting document
s to human readers. Although HTML identifies structures within a docum
ent, it does not allow the semantic content of document sections to be
specified explicitly. We investigate a small extension to HTML which
allows parts of a document to be mapped onto an underlying database sc
hema. This allows automatic identification and extraction of key infor
mation from a web using standard database techniques. Such ''lightweig
ht'' databases may span servers, with searches being performed at clie
nt- or server-side. We have applied this approach to generating ''flat
tened'' versions of hypertext documents suitable for printing.