Motivation: To illustrate an intuitive and statistically valid method
for combining independent sources of evidence that yields a p-value fo
r the complete evidence, and to apply it to the problem of detecting s
imultaneous matches to multiple patterns in sequence homology searches
. Results: In sequence analysis, two or more (approximately) independe
nt measure of the membership of a sequence (or sequence region) in som
e class are often available. We would like to estimate the likelihood
of the sequence being a member of the class in view of all the availab
le evidence. an example is estimating the significance of the observed
match of a macromolecular sequence (DNA or protein) to a set of patte
rns (motifs) that characterize a biological sequence family. An intuit
ive way to do this is to express each piece of evidence a as p-value,
and then use the product of these p-values as the measure of membershi
p in the family. We derive a formula and algorithm (QFAST) for calcula
ting the statistical distribution of the product of n independent p-va
lues. We demonstrate that sorting sequences by this p-value effectivel
y combines the information present in multiple motifs, leading to high
ly accurate and sensitive sequence homology searches.