Motivation: Traditionally, for packing calculations people have collected a
toms together into a number of distinct 'types'. These, in fact, often repr
esent a heavy atom and its associated hydrogens (i.e. a united atom). Also,
atom typing is usually done according to basic chemistry, giving rise to 2
0-30 protein atom types, such as carbonyl carbons, methyl groups, and hydro
xyl groups. No one has yet investigated how similar in packing these chemic
ally derived types are. Here we address this question in detail, using Voro
noi volume calculations on a set of high-resolution crystal structures.
Results: We perform a rigorous clustering analysis with cross-validation on
tens of thousands of atom volumes and attempt to compile them into types b
ased purely on packing. From our analysis, we are able to determine a 'mini
mal' set of 18 atom types that most efficiently represent the spectrum of p
acking in proteins. Furthermore, we are able to uncover a number of inconsi
stencies in traditional chemical typing schemes, where differently typed at
oms have almost the same effective size. In particular, we find that tetrah
edral carbons with two hydrogens are almost identical in size to many aroma
tic carbons with a single hydrogen.