Sj. Cordwell et I. Humpherysmith, EVALUATION OF ALGORITHMS USED FOR CROSS-SPECIES PROTEOME CHARACTERIZATION, Electrophoresis, 18(8), 1997, pp. 1410-1417
The ability to effectively search databases for the identification of
protein spots from two-dimensional electrophoresis gels has become an
essential step in the study of microbial proteomes. A variety of analy
tical techniques are currently being employed during protein character
isation. A number of algorithms used to search databases, accessible v
ia the World Wide Web, depend upon information concerning N- and C-ter
minal microsequence, amino acid composition, and peptide-mass fingerpr
inting. The effectiveness of nine such algorithms, as well as COMBINED
(software developed in this laboratory for identifying proteins acros
s species boundaries) was examined. Fifty-four ribosomal proteins from
the Mycoplasma genitalium genome, and 72 amino acyl tRNA synthetases
from the Haemophilus influenzae, M. genitalium and Methanococcus janna
schii genomes were chosen for study. These proteins were selected beca
use they represent a wide range of sequence identities across species
boundaries (22.7-100% identity), as detected by standard sequence alig
nment tools. Such sequence variation allowed for a statistical compari
son of algorithm success measured against published sequence identity.
The ability of analytical techniques used in protein characterisation
and associated database query programs to detect identity at the func
tional group level was examined for proteins with low levels of homolo
gy at the gene/protein sequence level. The significance of these theor
etical data manipulations provided the means to predict the utility of
data acquired experimentally for non-sequence-dependent software in p
roteome analysis. The data obtained also predicted that 'sequence tagg
ing' of peptide fingerprints would need to be accompanied by at least
11-20 residues of amino acid sequence for it to be widely used for pro
tein characterisation across species boundaries.