SOME USEFUL STATISTICAL PROPERTIES OF POSITION-WEIGHT MATRICES

Authors
Citation
Jm. Claverie, SOME USEFUL STATISTICAL PROPERTIES OF POSITION-WEIGHT MATRICES, Computers & chemistry, 18(3), 1994, pp. 287-294
Citations number
13
Categorie Soggetti
Computer Application, Chemistry & Engineering",Chemistry,"Computer Science Interdisciplinary Applications
Journal title
ISSN journal
00978485
Volume
18
Issue
3
Year of publication
1994
Pages
287 - 294
Database
ISI
SICI code
0097-8485(1994)18:3<287:SUSPOP>2.0.ZU;2-6
Abstract
Position-weight matrices (or profiles) are simple mathematical objects traditionally used to capture the information about local sequence pa tterns (or motifs) characteristic of a given structure or function. Al though weight matrices can lead to fast database scanning algorithms t heir usage has been limited, due to the lack of a reliable method to a ssess the statistical significance of the matching scores. In this art icle I first review 3 different computation scheme for designing weigh t matrices from a block-alignment of any (small or large) number of se quences. I then show that, for patterns spanning 10 positions or more, the best scores expected from matching random sequences are distribut ed according to the extreme value (Gumbel) distribution. The threshold of statistical significance assessed from this distribution perfectly delineate the range of scores characterizing ''true positive'' sequen ces (biological significant matches). This result allows weight matric es to be used to scan an entire protein database for patterns in a hig hly sensitive way. MODEST (MOtif DEsign and Search Tools), a suite of programs in Unix/C, implements these statistical improvements and is a vailable upon E-mail request (jmc(a)ncbi.nlm.nih.gov).