COMBINING EVIDENCE USING P-VALUES - APPLICATION TO SEQUENCE HOMOLOGY SEARCHES

Citation
Tl. Bailey et M. Gribskov, COMBINING EVIDENCE USING P-VALUES - APPLICATION TO SEQUENCE HOMOLOGY SEARCHES, BIOINFORMATICS, 14(1), 1998, pp. 48-54
Citations number
15
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Journal title
ISSN journal
13674803
Volume
14
Issue
1
Year of publication
1998
Pages
48 - 54
Database
ISI
SICI code
1367-4803(1998)14:1<48:CEUP-A>2.0.ZU;2-K
Abstract
Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value fo r the complete evidence, and to apply it to the problem of detecting s imultaneous matches to multiple patterns in sequence homology searches . Results: In sequence analysis, two or more (approximately) independe nt measure of the membership of a sequence (or sequence region) in som e class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the availab le evidence. an example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patte rns (motifs) that characterize a biological sequence family. An intuit ive way to do this is to express each piece of evidence a as p-value, and then use the product of these p-values as the measure of membershi p in the family. We derive a formula and algorithm (QFAST) for calcula ting the statistical distribution of the product of n independent p-va lues. We demonstrate that sorting sequences by this p-value effectivel y combines the information present in multiple motifs, leading to high ly accurate and sensitive sequence homology searches.