We show that it is possible to collect data that are useful for collaborati
ve filtering (CF) using an autonomous Web spider. In CE entities are recomm
ended to a new user based on the stated preferences of other, similar users
. We describe a CF spider that collects from the Web lists of semantically
related entities. These lists can then be used by existing CF algorithms by
encoding them as 'pseudo-users'. Importantly the spider can collect useful
data without pre-programmed knowledge about the format of particular pages
or particular sites. Instead, the CF spider uses commercial Web-search eng
ines to find pages likely to contain lists in the domain of interest, and t
hen applies previously proposed heuristics to extract lists from these page
s. We show that data collected by this spider are nearly as effective for C
F as data collected from real users, and more effective than data collected
by two plausible hand-programmed spiders. In some cases, autonomously spid
ered data can also be combined with actual user data to improve performance
. (C) 2000 Published by Elsevier Science B.V. All rights reserved.