We describe a number of improvements to the CAP sequence assembly prog
ram. These improvements include the development of methods for solving
the problem caused by simple repetitive sequences, for automatically
editing fragment alignments and consensus sequences, and for identifyi
ng chimeric fragments. The improved program (CAP2) assembled each of s
even data sets, six of which contain repetitive sequences of very stro
ng similarity, into a single sequence. As an example, CAP2 assembled a
set of 1467 fragments into a single sequence of 73,328 bp that has on
ly eight differences from the original sequence. The effects of fragme
nt length, coverage, and error rate on the performance of CAP2 were ev
aluated using artificial data sets. (C) 1996 Academic Press, Inc.