The appropriateness of evaluation criteria and measures have been a su
bject of debate and a vital concern in the information retrieval evalu
ation literature. A study was conducted to investigate the appropriate
ness of 20 measures for evaluating interactive information retrieval p
erformance, representing four major evaluation criteria. Among the 20
measures studied were the two most well-known relevance-based measures
of effectiveness, recall and precision. The user's judgment of inform
ation retrieval success was used as the devised criterion measure with
which all other 20 measures were to be correlated. A sample of 40 end
-users with individual information problems from an academic environme
nt were observed, interacting with six professional intermediaries sea
rching on their behalf in large operational systems. Quantitative data
consisting of values for all measures studied and verbal data contain
ing users' reasons for assigning certain values to selected measures w
ere collected. Statistical analysis of the quantitative data showed th
at precision, one of the most important traditional measures of effect
iveness, is not significantly correlated with the user's judgment of s
uccess. Users appear to be more concerned with absolute recall than wi
th precision, although absolute recall was not directly tested in the
study. Four related measures of recall and precision are found to be s
ignificantly correlated with success. Among these are user's satisfact
ion with completeness of search results and user's satisfaction with p
recision of the search. This article explores the possible explanation
s for this outcome through content analysis of users' verbal data. The
analysis shows that high precision does not always mean high quality
(relevancy, completeness, etc.) to users because of different users' e
xpectations. The user's purpose in obtaining information is suggested
to be the primary cause for the high concern for recall. Implications
for research and practice are discussed.