A de novo sequencing program for proteins is described that uses tandem MS
data from electron capture dissociation and collisionally activated dissoci
ation of electrosprayed protein ions. Computer automation is used to conver
t the fragment ion mass values derived from these spectra into the most pro
bable protein sequence, without distinguishing Leu/Ile. Minimum human input
is necessary for the data reduction and interpretation. No extra chemistry
is necessary to distinguish N- and C-terminal fragments in the mass spectr
a, as this is determined from the electron capture dissociation data. With
parts-per-million mass accuracy (now available by using higher field Fourie
r transform MS instruments), the complete sequences of ubiquitin (8.6 kDa)
and melittin (2.8 kDa) were predicted correctly by the program. The data av
ailable also provided 91% of the cytochrome c (12.4 kDa) sequence (essentia
lly complete except for the tandem MS-resistant region K-13-V-20 that conta
ins the cyclic heme). Uncorrected mass values from a 6-T instrument still g
ave 86% of the sequence for ubiquitin, except for distinguishing Gln/Lys. E
xtensive sequencing of larger proteins should be possible by applying the a
lgorithm to pieces of approximate to 10-kDa size, such as products of limit
ed proteolysis.