The formation of mature mRNAs in vertebrates involves the cleavage and poly
adenylation of the pre-mRNA, 10-30 nt downstream of an AAUAAA or AUUAAA sig
nal sequence. The extensive cDNA data now available shows that these hexame
rs are not strictly conserved. In order to identify variant polyadenylation
signals on a large scale, we compared over 8700 human 3' untranslated sequ
ences to 157,775 polyadenylated expressed sequence tags (ESTs], used as mar
kers of actual mRNA 3' ends. About 5600 EST-supported putative mRNA 3' ends
were collected and analyzed for significant hexameric sequences. Known pol
yadenylation signals were found in only 73% of the 3' fragments. Ten single
-base variants of the AAUAAA sequence were identified with a highly signifi
cant occurrence rate, potentially representing 14.9% of the actual polyaden
ylation signals. Of the mRNAs, 28.6% displayed two or more polyadenylation
sites. In these mRNAs, the poly(A) sites proximal to the coding sequence te
nd to use variant signals more often, while the 3'-most site tends to use a
canonical signal. The average number of ESTs associated with each signal t
ype suggests that variant signals (including the common AUUAAA] are process
ed less efficiently than the canonical signal and could therefore be select
ed for regulatory purposes. However, the position of the site in the untran
slated region may also play a role in polyadenylation rate.