A. Coghlan et al., Representation of amino acids as five-bit or three-bit patterns for filtering protein databases, BIOINFORMAT, 17(8), 2001, pp. 676-685
Motivation: We propose representing amino acids by bit-patterns so they may
be used in a filter algorithm for similarity searches over protein databas
es, to rapidly eliminate non-homologous regions of database sequences. The
filter algorithm would be based on dynamic programming optimization. It wou
ld have the advantage over previous filter algorithms that its substitution
scoring function distinguishes between conservative and non-conservative a
mino acid substitutions.
Results: Simulated annealing was used to search for the best five-bit or th
ree-bit patterns to represent amino acids, where similar amino acids were g
iven similar bit-patterns. The similarity between amino acids was estimated
from the BLOSUM45 matrix. Representing amino acids by these five-bit and t
hree-bit patterns, the Escherichia coli PhoE precursor and the bacteriophag
e PA2 LC precursor were aligned. The alignments were nearly the same as tha
t obtained when BLOSUM45 was used to score substitutions.