Variation in the estimates of the number of genes encoded by the human geno
me (28,000-120,000) attests to the difficulty of systematically identifying
human genes. Sequencing of human chromosome 22 (Chr22) provided the first
comprehensive, unbiased view of an entire human chromosome, and intensive a
nalysis of this sequence identified 545 genes and 134 pseudogenes that had
similarity or identity to known proteins and/or ESTs and which were listed
in the gene annotation (http://www.sanger.ac.uk/HGP/Chr22). This analysis y
ielded an estimate of approximately 36,000 functional expressed genes in th
e human genome (and 9000 pseudogenes). However, a key uncertainty in this e
stimate was that hundreds of additional genes beyond those annotated in the
Chr22 sequence are predicted by the gene prediction program Genscan, an un
known number of which might represent additional expressed genes. To determ
ine what fraction of these "predicted novel genes" (PNGs) represents expres
sed human genes, we used a sensitive RT-PCR assay to detect predicted trans
cripts in 17 tissues and one cell line. Our results indicate that at least
5000-9000 additional human genes which lack similarity to known genes or pr
oteins exist in the human genome, increasing baseline gene estimates to sim
ilar to 41,000-45,000.