Y. Suzuki et al., Identification and characterization of the potential promoter regions of 1031 kinds of human genes, GENOME RES, 11(5), 2001, pp. 677-684
To understand the mechanism of transcriptional regulation, it is essential
to identify and characterize the promoter, which is located proximal to the
mRNA start site. To identify the promoters from the large volumes of genom
ic sequences, we used mRNA start sites determined by a large-scale sequenci
ng of the cDNA libraries constructed by the "oligo-capping" method. We alig
ned the mRNA start sites with the genomic sequences and retrieved adjacent
sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequ
ences were searched to determine the frequencies of major promoter elements
. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained init
iators, 999 (97%) contained CC box, and 663 (64%) contained CAAT box. Furth
ermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG i
slands was reduced in TATA(+)/lnr(+) PPRs and in the PPRs of ubiquitously e
xpressed genes. In the PPRs of the CGM2 gene, the DRA gene, and the TM30pl
genes, which showed highly colon specific expression patterns, the consensu
s sequences of E boxes were commonly observed. The PPRs were also useful Fo
r exploring promoter SNPs.