Jh. Blusch et al., IDENTIFICATION OF ENDOGENOUS RETROVIRAL SEQUENCES BASED ON MODULAR ORGANIZATION - PROVIRAL STRUCTURE AT THE SSAV1 LOCUS, Genomics, 43(1), 1997, pp. 52-61
The current genome sequencing projects reveal megabases of unknown gen
omic sequences. About 1% of these sequences can be expected to be of r
etroviral origin. These are often severely deleted or mutated. Therefo
re, identification of the retroviral origin of these sequences can be
very difficult due to the absence of convincing overall sequence simil
arity. There are also many copies of solo-LTRs (long terminal repeats)
distributed throughout genomic sequences. LTR and envelope sequences
in general are among the most divergent parts of the retroviral genome
and thus especially hard to detect in mutated endogenous sequences. W
e took advantage of the fact that these retroviral sections contain sh
ort highly conserved sequence regions providing retroviral hallmarks e
ven after loss of overall similarity. We defined several sequence elem
ents and peptide motifs within LTR and Env sequences and used these el
ements to construct models for LTRs and Env proteins of mammalian C-ty
pe retroviruses. We then used this strategy to identify successfully t
he hitherto missing LTRs and an env-like region in the S71 human retro
viral sequence. Our approach provides a new strategy for identifying r
emotely related retroviral sequences in genomic DNA (especially human
DNA), of potential significance for the interpretation of genomic sequ
ences obtained from the current large-scale sequencing projects. (C) 1
997 Academic Press.