Motivation: Biological databases, with their rapidly ex panning contents, a
re indispensable tools in the quest to understand more about biological fun
ction. However, a serious riser of a database that comprises a large collec
tion of data, collected over a long period, will likely be struck by the in
consistency in reporting individual items of data. This paper takes a criti
cal look at the Protein Data Bank (PDB) to explore the seriousness of the p
roblem in one particular data set and to explore the implications to those
actively engaged in comparative analysis of these data.
Results: Averaged over the complete corpus, the stereochemical quality of a
tomic models has, in the past few years, moved towards ideal values. At the
same time, there are inconsistencies in how data are reported. Water conte
nt is not reported consistently and the percent of data collected when repo
rting the high-resolution shell varies, detracting from the value of resolu
tion as a yardstick for assessing the quality of a structure. A more detail
ed analysis of these inconsistencies is hampered by the lack of machine-rea
dable experimental data. To the riser of macromolecular structure data, thi
s suggests that structural details beyond the standard quality measures of
resolution and R value should be considered when using coordinate sets for
further derivation or in inferring biological function. To the curators of
the PDB, this suggests the need to capture more of the experimental dam ass
ociated with the experiment in a way that permits straightforward parsing.