ITA
ENG

COMBINING EVIDENCE USING P-VALUES - APPLICATION TO SEQUENCE HOMOLOGY SEARCHES

Authors

BAILEY TL GRIBSKOV M

Citation

Tl. Bailey et M. Gribskov, COMBINING EVIDENCE USING P-VALUES - APPLICATION TO SEQUENCE HOMOLOGY SEARCHES, BIOINFORMATICS, 14(1), 1998, pp. 48-54

Citations number

Categorie Soggetti

Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods

Journal title

BIOINFORMATICS → ACNP

ISSN journal

13674803

Volume

Issue

Year of publication

1998

Pages

48 - 54

Database

ISI

SICI code

1367-4803(1998)14:1<48:CEUP-A>2.0.ZU;2-K

Abstract

Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value fo r the complete evidence, and to apply it to the problem of detecting s imultaneous matches to multiple patterns in sequence homology searches . Results: In sequence analysis, two or more (approximately) independe nt measure of the membership of a sequence (or sequence region) in som e class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the availab le evidence. an example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patte rns (motifs) that characterize a biological sequence family. An intuit ive way to do this is to express each piece of evidence a as p-value, and then use the product of these p-values as the measure of membershi p in the family. We derive a formula and algorithm (QFAST) for calcula ting the statistical distribution of the product of n independent p-va lues. We demonstrate that sorting sequences by this p-value effectivel y combines the information present in multiple motifs, leading to high ly accurate and sensitive sequence homology searches.