Motivation: Database searching algorithms for proteins use scoring matrices
based on average protein properties, and thus are dominated by globular pr
oteins. However since transmembrane regions of a protein are in a distinctl
y different environment than globular proteins, one would expect generalize
d substitution matrices to be inappropriate for transmembrane regions.
Results: We present the PHAT (predicted hydrophobic and transmembrane) matr
ix, which significantly outperforms generalized matrices and a previously p
ublished transmembrane matrix in searches with transmembrane queries. We co
nclude that a better matrix can be constructed by using background frequenc
ies characteristic of the twilight zone, where low-scoring true positives h
ave scores indistinguishable from high-scoring false positives, rather than
the amino acid frequencies of the database. The PHAT matrix may help impro
ve the accuracy of sequence alignments and evolutionary trees of membrane p
roteins.