AN EXTENDED VECTOR-PROCESSING SCHEME FOR SEARCHING INFORMATION IN HYPERTEXT SYSTEMS

Authors
Citation
J. Savoy, AN EXTENDED VECTOR-PROCESSING SCHEME FOR SEARCHING INFORMATION IN HYPERTEXT SYSTEMS, Information processing & management, 32(2), 1996, pp. 155-170
Citations number
43
Categorie Soggetti
Information Science & Library Science","Information Science & Library Science","Computer Science Information Systems
ISSN journal
03064573
Volume
32
Issue
2
Year of publication
1996
Pages
155 - 170
Database
ISI
SICI code
0306-4573(1996)32:2<155:AEVSFS>2.0.ZU;2-C
Abstract
When searching information in a hypertext is limited to navigation, it is not an easy task, especially when the number of nodes and/or links becomes very large. A query-based access mechanism must be therefore provided to complement the navigational tools inherent in hypertext sy stems. Most mechanisms currently proposed are based on conventional in formation retrieval models which consider documents as independent ent ities, and ignore hypertext links. To promote the use of other informa tion retrieval mechanisms adapted to hypertext systems, this study att empts to respond to the following questions: (1) How can we integrate information given by hypertext links into an information retrieval sch eme? (2) Are these hypertext links (and link semantics) clues to the e nhancement of retrieval effectiveness? (3) If so, how can we use them? Two solutions are: (a) using a default weight function based on link type or assigning the same strength to all link types; or (b) using a specific weight for each particular link, i.e. the level of associatio n or a similarity measure. This study proposes an extended vector-proc essing scheme which extracts additional information from hypertext lin ks to enhance retrieval effectiveness. To carry out our investigations , we have built a hypertext based on two medium-size collections, the CACM and the CISI collection. The hypergraph is composed of explicit l inks (bibliographic references), computed links based on bibliographic information (bibliographic coupling, cocitation), or on hypertext lin ks established according to document representatives (nearest neighbor ).