Empirical studies of retrieval performance have shown a tendency for P
recision to decline as Recall increases. This article examines the nat
ure of the relationship between Precision and Recall. The relationship
s between Recall and the number of documents retrieved, between Precis
ion and the number of documents retrieved, and between Precision and R
ecall are described in the context of different assumptions about retr
ieval performance. It is demonstrated that a tradeoff between Recall a
nd Precision is unavoidable whenever retrieval performance is consiste
ntly better than retrieval at random. More generally, for the Precisio
n-Recall trade-off to be avoided as the total number of documents retr
ieved increases, retrieval performance must be equal to or better than
overall retrieval performance up to that point. Examination of the ma
thematical relationship between Precision and Recall shows that a quad
ratic Recall curve can resemble empirical Recall-Precision behavior if
transformed into a tangent parabola. With very large databases and/or
systems with limited retrieval capabilities there can be advantages t
o retrieval in two stages: Initial retrieval emphasizing high Recall,
followed by more detailed searching of the initially retrieved set, ca
n be used to improve both Recall and Precision simultaneously. Even so
, a tradeoff between Precision and Recall remains.