Results: We have produced a computer program, named sim3, that solves
the following computational problem. Two DNA sequences are given, wher
e the shorter sequence is very similar to some contiguous region of th
e longer sequence. Sim3 determines such a similar region of the longer
sequence, and then computes an optimal set of single-nucleotide chang
es (i.e., insertions, deletions or substitutions) that will convert th
e shorter sequence to that region. Thus, the alignment scoring scheme
is designed to model sequencing errors, rather than evolutionary proce
sses. The program can align a 100 kb sequence to a I megabase sequence
in a few seconds on a workstation, provided that there are very few d
ifferences between the shorter sequence and some region in the longer
sequence. The program has been used to assemble sequence data for the
Genomes Division at the National Center for Biotechnology Information.
Availability: A version of sim3 for UNIX machines can be obtained by
anonymous ftp from ncbi. nlm. nih, gov, in the pub/sim3 directory. Con
tact: For portable versions for Macs and PCs, contact zjing@sunset. nl
m. nih. gov.