ITA
ENG

EID: the Exon-Intron Database - an exhaustive database of protein-coding intron-containing genes

Authors

Saxonov, S Daizadeh, I Fedorov, A Gilbert, W

Citation

S. Saxonov et al., EID: the Exon-Intron Database - an exhaustive database of protein-coding intron-containing genes, NUCL ACID R, 28(1), 2000, pp. 185-190

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

NUCLEIC ACIDS RESEARCH

ISSN journal

03051048 → ACNP

Volume

Issue

Year of publication

2000

Pages

185 - 190

Database

ISI

SICI code

0305-1048(20000101)28:1<185:ETED-A>2.0.ZU;2-4

Abstract

To aid studies of molecular evolution and to assist in gene prediction rese arch, we have constructed an Exon-Intron Database (EID) in FASTA format. Cu rrently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif information, There is 17% redundancy inherited from Ge nBank-a purge at the 99% identity level reduced the data-base to 42 460 gen es (243 589 exons), We have created subdatabases of genes whose intron posi tions have been experimentally determined, One such: database, 'constructed by comparing genomic and mRNA sequences, contains 11 242 genes (62 474 exo ns), A larger database of 22 196 genes (105 595 exons) was constructed by s electing on keywords to eliminate computer-predicted genes, By examining th e two nucleotides adjacent to the intron boundary, we infer that there is a 2% rate of errors or other deviations from the standard GT...AG motif in n uclear genes, This criterion can be used to eliminate 4921 genes from the o verall database. Various tools are provided to enable generation of user-sp ecific subsets of the EID, The EID distribution can be obtained from http:/ mcb.harvard.edu/gilbert/EID.