ITA
ENG

Brute force estimation of the number of human genes using EST clustering as a measure

Authors

Davison, DB Burke, JF

Citation

Db. Davison et Jf. Burke, Brute force estimation of the number of human genes using EST clustering as a measure, IBM J RES, 45(3-4), 2001, pp. 439-447

Citations number

Categorie Soggetti

Multidisciplinary,"Computer Science & Engineering

Journal title

IBM JOURNAL OF RESEARCH AND DEVELOPMENT

ISSN journal

00188646 → ACNP

Volume

Issue

3-4

Year of publication

2001

Pages

439 - 447

Database

ISI

SICI code

0018-8646(200105/07)45:3-4<439:BFEOTN>2.0.ZU;2-Q

Abstract

A current question of considerable interest to both the medical and nonmedi cal communities concerns the number of human transcription units (which, fo r the purposes of this paper, are "genes") and proteins. Even with the rece nt announcement of the completion of the draft sequence of the human genome , it is still extremely difficult to predict the number of genes present in the genome. There are several methods for gene prediction, all involving c omputational tools. One way to approach this question, involving both compu tation and experiment, is to look at copies of fragments of messenger ribon ucleic acid (mRNA) called expressed sequence tags (ESTs). The mRNA comes on ly from a gene being expressed, or translated, into RNA; by clustering mRNA fragments, we can try to reconstruct the expressed gene. While the final r esult is a very rough representation of the "true expressed transcript," it is probably within 20% of the real number. Here, we review the issues invo lved in EST clustering and present an estimate of the total number of human genes. Our results to date indicate that there are some 70000 transcriptio n units, with an average of 1.2 different transcripts per transcription uni t. Thus, we estimate the total number of human proteins to be at least 85 0 00. The total number of proteins will be higher because of post-translation al modification.