The TREC benchmarking exercise for information retrieval (IR) experiments h
as provided a forum and an opportunity for IR researchers to evaluate the p
erformance of their approaches to the in task and has resulted in improveme
nts in in effectiveness. Typically, retrieval performance has been measured
in terms of precision and recall, and comparisons between different in app
roaches have been based on these measures. These measures are in turn depen
dent on the so-called "pool depth" used to discover relevant documents. Whe
reas there is evidence to suggest that the pool depth size used for TREC ev
aluations adequately identifies the relevant documents in the entire test d
ata collection, we consider how it affects the evaluations of individual sy
stems. The data used comes from the Sixth TREC conference, TREC-6. By fitti
ng appropriate regression models we explore whether different pool depths c
onfer advantages or disadvantages on different retrieval systems when they
are compared. As a consequence of this model fitting, a pair of measures fo
r each retrieval run, which are related to precision and recall, emerge. Fo
r each system, these give an extrapolation for the number of relevant docum
ents the system would have been deemed to have retrieved if an indefinitely
large pool size had been used, and also a measure of the sensitivity of ea
ch system to pool size. We concur that even on the basis of analyses of ind
ividual systems, the pool depth of 100 used by TREC is adequate.