The complexity of geophysical systems makes modelling them a formidable tas
k, and in many cases research studies are still in the phenomenological sta
ge. In earthquake physics, long timescales and the lack of any natural labo
ratory restrict research to retrospective analysis of data. Such 'fishing e
xpedition' approaches lead to optimal selection of data, albeit not always
consciously. This introduces significant biases, which are capable of false
ly representing simple statistical fluctuations as significant anomalies re
quiring fundamental explanations. This paper identifies three different str
ategies for discriminating real issues from artefacts generated retrospecti
vely. The first attempts to identify ab initio each optimal choice and acco
unt for it. Unfortunately, a satisfactory solution can only be achieved in
particular cases. The second strategy acknowledges this difficulty as well
as the unavoidable existence of bias, and classifies all 'anomalous' observ
ations as artefacts unless their retrospective probability of occurrence is
exceedingly low (for instance, beyond six standard deviations). However, s
uch a strategy is also likely to reject some scientifically important anoma
lies. The third strategy relies on two separate steps with learning and val
idation performed on effectively independent sets of data. This approach ap
pears to be preferable in the case of small samples, such as are frequently
encountered in geophysics, but the requirement for forward validation impl
ies long waiting times before credible conclusions can be reached. A practi
cal application to pattern recognition, which is the prototype of retrospec
tive 'fishing expeditions', is presented, illustrating that valid conclusio
ns are hard to find.