ITA
ENG

WHAT IS A TALL POPPY AMONG WEB PAGES

Authors

PRINGLE G ALLISON L DOWE DL

Citation

G. Pringle et al., WHAT IS A TALL POPPY AMONG WEB PAGES, Computer networks and ISDN systems, 30(1-7), 1998, pp. 369-377

Citations number

Categorie Soggetti

Computer Science Information Systems",Telecommunications,"Engineering, Eletrical & Electronic","Computer Science Information Systems

Journal title

Computer networks and ISDN systems → ACNP

ISSN journal

01697552

Volume

Issue

1-7

Year of publication

1998

Pages

369 - 377

Database

ISI

SICI code

0169-7552(1998)30:1-7<369:WIATPA>2.0.ZU;2-D

Abstract

Search engines and indices were created to help people find informatio n amongst the rapidly increasing number of World Wide Web (WWW) pages. The search engines automatically visit and index pages so that they c an return good matches for their users' queries. The way that this ind exing is done varies from engine to engine and the detail is usually s ecret although the strategy is sometimes made public in general terms. The search engines' aim is to return relevant pages quickly. On the o ther hand, the author of a Web page has a vested interest in it rating highly, for appropriate queries, on as many search engines as possibl e. Some authors have an interest in their page rating well for a great many types of query indeed - spamming has come to the Web. We treat m odelling the workings of WWW search engines as an inductive inference problem. A training set of data is collected, being pages returned in response to typical, queries. Decision trees are used as the model cla ss for the search engines' selection criteria although this is not to say that search engines actually contain decision trees. A machine lea rning program is used to infer a decision tree for each search engine, an information-theory criterion being used to direct the inference an d to prevent over-fitting. (C) 1998 Published by Elsevier Science B.V. All rights reserved.