Wb. Strong et Rg. Nelson, Preliminary profile of the Cryptosporidium parvum genome: an expressed sequence tag and genome survey sequence analysis, MOL BIOCH P, 107(1), 2000, pp. 1-32
Cryptosporidium parvum is a protozoan enteropathogen that infects humans an
d animals and causes a pronounced diarrheal disease that can be life-threat
ening in immunocompromised hosts. No specific chemo- or immunotherapies exi
st to treat cryptosporidiosis and little molecular information is available
to guide development of such therapies. To accelerate gene discovery and i
dentify genes encoding potential drug and vaccine targets we constructed sp
orozoite cDNA and genomic DNA sequencing libraries from the Iowa isolate of
C. parvum and determined similar to 2000 sequence tags by single-pass sequ
encing of random clones. Together, the 567 expressed sequence tags (ESTs) a
nd 1507 genome survey sequences (GSSs) totaled one megabase (1 mb) of uniqu
e genomic sequence indicating that similar to 10% of the 10.4 mb C. parvum
genome has been sequence tagged in this gene discovery expedition. The tags
were used to search the public nucleic acid and protein databases via BLAS
T analyses, and 180 ESTs (32%) and 277 GSSs (18%) exhibited similarity with
database sequences at smallest sum probabilities P(N) less than or equal t
o 10(-8). Some tags encoded proteins with clear therapeutic potential inclu
ding S-adenosylhomocysteine hydrolase? histone deacetylase. polyketide/fatt
y-acid synthases, various cyclophilins, thrombospondin-I-elated cysteine-ri
ch protein and ATP-binding-cassette transporters. Several anonymous ESTs en
coded proteins predicted to contain signal peptides or multiple transmembra
ne spanning segments suggesting they were destined for membrane-bound compa
rtments, the cell surface or extracellular secretion. One-hundred four simp
le sequence repeats were identified within the nonredundant sequence tag co
llection with (TAA)(greater than or equal to 6/)(TTA)(greater than or equal
to 6) and (TA)(greater than or equal to 10)/(AT)(greater than or equal to
10) being the most prevalent, occurring 40 and 15 times, respectively. Vari
ous cellular RNAs and their genes were also identified including the small
and large ribosomal RNAs, five tRNAs, the U2 small nuclear RNA, and the sma
ll and large virus-like, double-stranded RNAs. This investigation has demon
strated that survey sequencing is an efficient procedure for gene discovery
and genome characterization and has identified and sequence tagged many C.
parvum genes encoding potential therapeutic targets. (C) 2000 Elsevier Sci
ence B.V. All rights reserved.