EID: the Exon-Intron Database - an exhaustive database of protein-coding intron-containing genes

Citation
S. Saxonov et al., EID: the Exon-Intron Database - an exhaustive database of protein-coding intron-containing genes, NUCL ACID R, 28(1), 2000, pp. 185-190
Citations number
19
Categorie Soggetti
Biochemistry & Biophysics
Journal title
NUCLEIC ACIDS RESEARCH
ISSN journal
03051048 → ACNP
Volume
28
Issue
1
Year of publication
2000
Pages
185 - 190
Database
ISI
SICI code
0305-1048(20000101)28:1<185:ETED-A>2.0.ZU;2-4
Abstract
To aid studies of molecular evolution and to assist in gene prediction rese arch, we have constructed an Exon-Intron Database (EID) in FASTA format. Cu rrently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif information, There is 17% redundancy inherited from Ge nBank-a purge at the 99% identity level reduced the data-base to 42 460 gen es (243 589 exons), We have created subdatabases of genes whose intron posi tions have been experimentally determined, One such: database, 'constructed by comparing genomic and mRNA sequences, contains 11 242 genes (62 474 exo ns), A larger database of 22 196 genes (105 595 exons) was constructed by s electing on keywords to eliminate computer-predicted genes, By examining th e two nucleotides adjacent to the intron boundary, we infer that there is a 2% rate of errors or other deviations from the standard GT...AG motif in n uclear genes, This criterion can be used to eliminate 4921 genes from the o verall database. Various tools are provided to enable generation of user-sp ecific subsets of the EID, The EID distribution can be obtained from http:/ mcb.harvard.edu/gilbert/EID.