Completion of the human genome sequence provides evidence for a gene count
with lower bound 30,000-40,000. Significant protein complexity may derive i
n part from multiple transcript isoforms. Recent EST based studies have rev
ealed that alternate transcription, including alternative splicing, polyade
nylation and transcription start sites, occurs within at least 30-40% of hu
man genes. Transcript form Surveys have yet to integrate the genomic contex
t, expression, frequency, and contribution to protein diversity of isoform
variation. We determine here the degree to which protein coding diversity m
ay be influenced by alternate expression of transcripts by exhaustive manua
l confirmation of genome sequence annotation, and comparison to available t
ranscript data to accurately associate skipped exon isoforms with genomic s
equence. Relative expression levels of transcripts are estimated from EST d
atabase representation. The rigorous in silico method accurately identifies
exon skipping using verified genome sequence. 545 genes have been studied
in this first hand-curated assessment of exon skipping on chromosome 22. Co
mbining manual assessment with software screening of exon boundaries provid
es a highly accurate and internally consistent indication of skipping frequ
ency. 57 of 62 exon skipping events occur in the protein coding regions of
52 genes. A single gene, (FBXO7) expresses ail exon repetition. 59% of high
ly represented multi-exon genes are likely to express exon-skipped isoforms
in ratios that vary from 1:1 to 1:> 100. The proportion of all transcripts
corresponding to multi-exon genes that exhibit ail exon skip is estimated
to be 5%.