A random sequencing approach for the analysis of the Trypanosoma cruzi genome: General structure, large gene and repetitive DNA families, and gene discovery
F. Aguero et al., A random sequencing approach for the analysis of the Trypanosoma cruzi genome: General structure, large gene and repetitive DNA families, and gene discovery, GENOME RES, 10(12), 2000, pp. 1996-2005
A random sequence survey of the genome of Trypanosoma cruzi, the agent of C
hagas disease, was performed and 11,459 genomic sequences were obtained, re
sulting in similar to4.3 Mb of readable sequences or similar to 10% of the
parasite haploid genome. The estimated total GC content was 50.9%, with a h
igh representation of A and T di- and trinucleotide repeats. Out of the est
imated 5000 parasite genes, 947 putative new genes were identified. Another
1723 sequences corresponded to genes detected previously in T. cruzi throu
gh expression sequence tag analysis. 7735 sequences had no matches in the d
atabase, but the presence of open reading frames that passed Fickett's test
suggests that some might contain coding DNA. The survey was highly redunda
nt, with similar to 35% of the sequences included in a few large sequence f
amilies. Some of them code for protein Families present in dozens of copies
, including proteins essential for parasite survival and retrotransposons.
Other sequence families include repetitive DNA present in thousands of copi
es per haploid genome. Some families in the latter group are new, parasite-
specific, repetitive DNAs. These results suggest that T. cruzi could consti
tute an interesting model to analyze gene and genome evolution due to its p
lasticity in terms of sequence amplification and divergence. Additional inf
ormation can be found at http://www.iib.unsam.edu.ar/tcruzi.gss.html.