Integration of retrovirus DNA is a specific process catalyzed by the integr
ase protein acting to join the viral substrate DNA (att) sequences of about
10 bases at the ends of the long terminal repeat (LTR) to various sites in
the host target cell DNA. Although the interaction is sequence specific, t
he att sequences of different retroviruses are largely unrelated to one ano
ther and usually differ between the two ends of the viral DNA. To define su
bstrate sequence specificity, we designed an "in vitro evolution" scheme to
select an optimal substrate sequence by competitive integration in vitro f
rom a large pool of partially randomized substrates. Integrated substrates
are enriched by PCR amplification and then regenerated and subjected to sub
sequent cycles of selection and enrichment. Using this approach, we obtaine
d the optimal substrate sequence of 5'-ACGACAA CA-3' for avian sarcoma-leuk
osis virus (ASLV) and 5'-AACA(A/C)AGCA-3' for human immunodeficiency virus
type 1, which differed from those found at both ends of the viral DNA. Clon
al analysis of the integration products showed that ASLV integrase can use
a wide variety of substrate sequences in vitro, although the consensus sequ
ence was identical to the selected sequence. By a competition assay, the se
lected nucleotide at position 4 improved the in vitro integration efficienc
y over that of the wild-type sequence. Viral mutants bearing the optimal se
quence replicated at wild-type levels, with the exception of some mutations
disrupting the U5 RNA secondary structure important for reverse transcript
ion, which were significantly impaired. Thus, maximizing the efficiency of
integration may not be of major importance for efficient retrovirus replica
tion.