Document filtering is increasingly deployed in Web environments to reduce i
nformation overload of users. We formulate online information filtering as
a reinforcement learning problem, i.e., TD(0). The goal is to learn user pr
ofiles that best represent information needs and thus maximize the expected
value of user relevance feedback. A method is then presented that acquires
reinforcement signals automatically by estimating user's implicit feedback
from direct observations of browsing behaviors. This "learning by observat
ion'' approach is contrasted with conventional relevance feedback methods w
hich require explicit user feedbacks. Field tests have been performed that
involved 10 users reading a total of 18,750 HTML documents during 45 days.
Compared to the existing document filtering techniques, the proposed learni
ng method showed superior performance in information quality and adaptation
speed to user preferences in online filtering.