The YPA project is building a system to make the information in classi
fied directories more accessible. BT's Yellow Pages((R)1) provides an
example of a classified database with which this work would be useful.
There are two reasons for doing this: (i) directories like Yellow Pag
es contain much useful but hard-to-access information, especially in t
he free text in semi-display advertisements; (ii) more generally, the
project is a demonstrator for exploitation of semi-structured data - d
ata that is less systematic than database entries or logical clauses,
but more systematic than free text because it has been marked up, for
display or some other purpose. Accessing the directory source data fil
e requires both natural language processing (for softening the interfa
ce to the system, and separately for analysis of natural-language-like
constructs in the data) and information retrieval techniques, which a
re assisted by shallow knowledge. Deep world knowledge is impractical.
The project seeks to get maximum effect from conveniently simplified
approximations of standard natural language processing and knowledge r
epresentation. The paper gives an overview of the system, and illustra
tes its style with points about how the source data file is analysed.
The YPA requires further development, but already demonstrates the eff
ectiveness of shallow processing applied to semi-structured data.