Md. Adams et al., INITIAL ASSESSMENT OF HUMAN GENE DIVERSITY AND EXPRESSION PATTERNS BASED UPON 83-MILLION NUCLEOTIDES OF CDNA SEQUENCE, Nature, 377, 1995, pp. 3
In an effort to identify new genes and analyse their expression patter
ns, 174,472 partial complementary DNA sequences (expressed sequence ta
gs (ESTs)), totalling more than 52 million nucleotides of human DNA se
quence, have been generated from 300 cDNA libraries constructed from 3
7 distinct organs and tissues. These ESTs have been combined with an a
dditional 118,406 ESTs from the database dbEST, for a total of 83 mill
ion nucleotides, and treated as a shotgun sequence assembly project. T
he assembly process yielded 29,599 distinct tentative human consensus
(THC) sequences and 58,384 non-overlapping ESTs. Of these 87,983 disti
nct sequences, 10,214 further characterize previously known genes base
d on statistically significant similarity to sequences in the availabl
e databases; the remainder identify previously unknown genes. Thirty t
issues were sampled by over 1,000 ESTs each; only eight genes were mat
ched by ESTs from all 30 tissues, and 227 genes were represented in 20
or more of the tissues sampled with more than 1,000 ESTs. Approximate
ly 40% of identified human genes appear to be associated with basic en
ergy metabolism, cell structure, homeostasis and cell division, 22% wi
th RNA and protein synthesis and processing, and 12% with cell signall
ing and communication.