The rules that govern the dynamics of protein characterisation by pept
ide-mass fingerprinting (PMF) were investigated through multiple inter
rogations of a nonredundant protein database. This was achieved by ana
lysing the efficiency of identifying each entry in the entire database
via perfect in silico digestion with a series of 20 pseudo-endoprotei
nases cutting at the carboxy terminal of each amino acid residue, and
the multiple cutters: trypsin, chymotrypsin and Glu-C. The distributio
n of peptide fragment masses generated by endoproteinase digestion was
examined with a view to designing better approaches to protein charac
terisation by PMF On average, and for both common and rare cutters, th
e combination of approximately two fragments was sufficient to identif
y most database entries. However, the rare cutters left more entries u
nidentified in the database. Total coverage of the entire database cou
ld not be achieved with one enzymatic cutter alone, nor when all 23 cu
tters were used together. Peptide fragments of > 5000 Da had little ef
fect on the outcome of PMF to correctly characterise database entries,
while those with low mass (near to 350 Da in the case of trypsin) wer
e found to be of most utility. The most frequently occurring fragments
were also found in this lower mass region. The maximum size of uncut
database entries (those not containing a specific amino acid residue)
ranged from 52 908 Da to 258 314 Da, while the failure rate for a sing
le cutter in identifying database entries varied from 10 865 (8.4%) to
23 290 (18.1%). PMF is likely to be a mainstay of any high-throughput
protein screening strategy for large-scale proteome analysis. A bette
r understanding of the merits and limitations of this technique will a
llow researchers to optimise their protein characterisation procedures
.