ITA
ENG

SOME USEFUL STATISTICAL PROPERTIES OF POSITION-WEIGHT MATRICES

Authors

CLAVERIE JM

Citation

Jm. Claverie, SOME USEFUL STATISTICAL PROPERTIES OF POSITION-WEIGHT MATRICES, Computers & chemistry, 18(3), 1994, pp. 287-294

Citations number

Categorie Soggetti

Computer Application, Chemistry & Engineering",Chemistry,"Computer Science Interdisciplinary Applications

Journal title

Computers & chemistry → ACNP

ISSN journal

00978485

Volume

Issue

Year of publication

1994

Pages

287 - 294

Database

ISI

SICI code

0097-8485(1994)18:3<287:SUSPOP>2.0.ZU;2-6

Abstract

Position-weight matrices (or profiles) are simple mathematical objects traditionally used to capture the information about local sequence pa tterns (or motifs) characteristic of a given structure or function. Al though weight matrices can lead to fast database scanning algorithms t heir usage has been limited, due to the lack of a reliable method to a ssess the statistical significance of the matching scores. In this art icle I first review 3 different computation scheme for designing weigh t matrices from a block-alignment of any (small or large) number of se quences. I then show that, for patterns spanning 10 positions or more, the best scores expected from matching random sequences are distribut ed according to the extreme value (Gumbel) distribution. The threshold of statistical significance assessed from this distribution perfectly delineate the range of scores characterizing ''true positive'' sequen ces (biological significant matches). This result allows weight matric es to be used to scan an entire protein database for patterns in a hig hly sensitive way. MODEST (MOtif DEsign and Search Tools), a suite of programs in Unix/C, implements these statistical improvements and is a vailable upon E-mail request (jmc(a)ncbi.nlm.nih.gov).