We introduce a quantitative framework for assessing the generation of cross
overs in DNA shuffling experiments. The approach uses free energy calculati
ons and complete sequence information to model the annealing process. Stati
stics obtained for the annealing events then are combined with a reassembly
algorithm to infer crossover allocation in the reassembled sequences. The
fraction of reassembled sequences containing zero, one, two, or more crosso
vers and the probability that a given nucleotide position in a reassembled
sequence is the site of a crossover event are estimated. Comparisons of the
predictions against experimental data for five example systems demonstrate
good agreement despite the fact that no adjustable parameters are used. An
in silico case study of a set of 12 subtilases examines the effect of frag
mentation length, annealing temperature, sequence identity and number of sh
uffled sequences on the number, type, and distribution of crossovers. A com
putational verification of crossover aggregation in regions of near-perfect
sequence identity and the presence of synergistic reassembly in family DNA
shuffling is obtained.