We propose regular expression types as a foundation for XML processing lang
uages. Regular expression types are a natural generalization of Document Ty
pe Definitions (DTDs), describing structures in XML documents using regular
expression operators (i.e., *, ?, I, etc.) and supporting a simple but pow
erful notion of subtyping.
The decision problem for the subtype relation is EXPTIME-hard, but it can b
e checked quite efficiently in many cases of practical interest. The subtyp
ing algorithm developed here is a variant of Aiken and Murphy's set-inclusi
on constraint solver, to which are added several optimizations and two new
properties: (1) our algorithm is provably complete, and (2) it allows a use
ful "subtagging" relation between nodes with different labels in XML trees.