FROM TEXT TO HYPERTEXT BY INDEXING

Citation
A. Salminen et al., FROM TEXT TO HYPERTEXT BY INDEXING, ACM transactions on information systems, 13(1), 1995, pp. 69-99
Citations number
46
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems
ISSN journal
10468188
Volume
13
Issue
1
Year of publication
1995
Pages
69 - 99
Database
ISI
SICI code
1046-8188(1995)13:1<69:FTTHBI>2.0.ZU;2-1
Abstract
A model is presented for converting a collection of documents to hyper text by means of indexing. The documents are assumed to be semistructu red, i.e., their text is a hierarchy of parts, and some of the parts c onsist of natural language. The model is intended as a framework for s pecifying hypertextual reading capabilities for specific application a reas and for developing new automated tools for the conversion of semi structured text to hypertext. In the model, two well-known paradigms-f ormal grammars and document indexing-are combined. The structure of th e source text is defined by a schema that is a constrained context-fre e grammar. The hierarchic structure of the source may thus be modeled by a parse tree for the grammar. The effect of indexing is described b y grammar transformations. The new grammar, called an indexing schema, is associated with a new parse tree where some text parts are index e lements. The indexing schema may hide some parts of the original docum ents or the structure of some parts. For information retrieval, parts of the indexed text are considered to be nodes of a hypergraph. In the hypergraph-based information access, the navigation capabilities of h ypertext systems are combined with the querying capabilities of inform ation retrieval systems.