Term co-occurrence in Internet queries: An analysis of the Excite data base

Authors
Citation
D. Wolfram, Term co-occurrence in Internet queries: An analysis of the Excite data base, CAN J INF L, 24(2-3), 1999, pp. 12-33
Citations number
20
Categorie Soggetti
Library & Information Science
Journal title
CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE
ISSN journal
1195096X → ACNP
Volume
24
Issue
2-3
Year of publication
1999
Pages
12 - 33
Database
ISI
SICI code
1195-096X(199906/09)24:2-3<12:TCIIQA>2.0.ZU;2-T
Abstract
Unique queries submitted to the Excite search engine were analyzed for empi rical regularities in the co-occurrence of search terms. The distribution o f the frequency of term pair occurrences was fitted to three models to dete rmine whether the pattern of term usage followed a Zipfian distribution. Re latively poor fits were obtained leading the author to conclude that the di stribution is not Zipfian. Two simulation models were developed based on em pirical distributions of term co-occurrences and terms submitted per query to determine if binary dependence for specific query terms and combinations of term sizes were evident/A strong binary dependence relationship was obs erved for specific co-occcurring terms. An analysis of co-occurences based on term sizes revealed that the simulation model underestimated the co-occu rrences of less frequently used terms and highly used terms. The implicatio ns for web based IR system searching and design are discussed.