A fast computer program, FOLDTRAJ, to generate plausible random protein str
uctures is reported. All-atom proteins are made directly in continuous thre
e-dimensional space starting from primary sequence with an N to C directed
build-up method, The method uses a novel pipelined residue addition approac
h in which the leading edge of the protein is constructed three residues at
a time for optimal protein geometry, including the placement of cis prolin
e, Build-up methods represent a classic N-body problem, expected to scale a
s N-2. When proteins become more collapsed, build-up methods are susceptibl
e to backtracking problems which can scale exponentially with the number of
residues required to back out of a trapped walk. We have provided solution
s to both these problems, using a multiway binary tree that makes the N-bod
y problem of bump-checking scale as NlogN, and speeding up backtracking by
varying the number of tries before backtracking based on available conforma
tional space. FOLDTRAJ is independent of energy potentials, other than that
implicit in the geometrical properties derived by statistical studies of k
nown structures, and in atomic Van der Waals radii. WHAT CHECK shows that t
he program generates chirally and physically valid proteins with all bond l
engths, angles and dihedrals within allowable tolerances, Random structures
built using sequences from PDB files 1SEM, 2HPR, and 1RTP typically have 5
-15% cy-helical content (according to DSSP) and on the order of 20% beta-st
rand/extended content, Ensembles of random structures are compared with pol
ymer theory and with experimentally determined fluorescence resonance energ
y transfer distances. Reasonably sized structure ensembles do sample most o
f the conformational space available to proteins, The method is also capabl
e of protein reconstruction using C alpha-C alpha direction vectors, and it
compares favorably with methods that reconstruct protein backbones based o
n alpha-carbon coordinates, having an average backbone and CP root mean squ
are deviation of 0.63 Angstrom for nine different protein folds. Proteins 2
000;39:112-131, (C) 2000 Wiley-Liss, Inc.