Biological sequences integrated: A relational database approach

Citation
A. Bergholz et al., Biological sequences integrated: A relational database approach, ACT BIOTH, 49(3), 2001, pp. 145-159
Citations number
18
Categorie Soggetti
Biology
Journal title
ACTA BIOTHEORETICA
ISSN journal
00015342 → ACNP
Volume
49
Issue
3
Year of publication
2001
Pages
145 - 159
Database
ISI
SICI code
0001-5342(2001)49:3<145:BSIARD>2.0.ZU;2-X
Abstract
Over the last decade the modeling and the storage of biological data has be en a topic of wide interest for scientists dealing with biological and biom edical research. Currently most data is still stored in text files which le ads to data redundancies and file chaos. In this paper we show how to use relational modeling techniques and relatio nal database technology for modeling and storing biological sequence data, i.e. for data maintained in collections like EMBL or SWISS-PROT to better s erve the needs for these application domains. For this reason we propose a two step approach. First, we model the structu re (and therefore the meaning of the) data using an Entity-Relationship app roach. The ER model leads to a clean design of a relational database schema for storing and retrieving the DNA and protein data extracted from various sources. Our approach provides the clean basis for building complex biolog ical applications that are more amenable to changes and software ports than their file-base counterparts.