Representation of amino acids as five-bit or three-bit patterns for filtering protein databases

Citation
A. Coghlan et al., Representation of amino acids as five-bit or three-bit patterns for filtering protein databases, BIOINFORMAT, 17(8), 2001, pp. 676-685
Citations number
29
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
17
Issue
8
Year of publication
2001
Pages
676 - 685
Database
ISI
SICI code
1367-4803(200108)17:8<676:ROAAAF>2.0.ZU;2-R
Abstract
Motivation: We propose representing amino acids by bit-patterns so they may be used in a filter algorithm for similarity searches over protein databas es, to rapidly eliminate non-homologous regions of database sequences. The filter algorithm would be based on dynamic programming optimization. It wou ld have the advantage over previous filter algorithms that its substitution scoring function distinguishes between conservative and non-conservative a mino acid substitutions. Results: Simulated annealing was used to search for the best five-bit or th ree-bit patterns to represent amino acids, where similar amino acids were g iven similar bit-patterns. The similarity between amino acids was estimated from the BLOSUM45 matrix. Representing amino acids by these five-bit and t hree-bit patterns, the Escherichia coli PhoE precursor and the bacteriophag e PA2 LC precursor were aligned. The alignments were nearly the same as tha t obtained when BLOSUM45 was used to score substitutions.