Motivation: An automatic sequence searching method (ProtEST) is described w
hich constructs multiple protein sequence alignments from protein sequences
and translated expressed sequence tags (ESTs). ProtEST is more effective t
han a simple TBLASTN search of the query against the EST database, as the s
equences are automatically clustered, assembled, made non-redundant, checke
d for sequence errors, translated into protein and then aligned and display
ed.
Results: A ProtEST search found a non-redundant, translated error- and leng
th-corrected EST sequence for >58% of sequences when single sequences from
1407 Pfam-A seed alignments were used as the probe. The average family size
of the resulting alignments of translated EST sequences contained >10 sequ
ences. In a cross-validated test of protein secondary structure prediction,
alignments from the new procedure led to an improvement of 3.4% average Q(
3) prediction accuracy over single sequences.