Over the last decade the modeling and the storage of biological data has be
en a topic of wide interest for scientists dealing with biological and biom
edical research. Currently most data is still stored in text files which le
ads to data redundancies and file chaos.
In this paper we show how to use relational modeling techniques and relatio
nal database technology for modeling and storing biological sequence data,
i.e. for data maintained in collections like EMBL or SWISS-PROT to better s
erve the needs for these application domains.
For this reason we propose a two step approach. First, we model the structu
re (and therefore the meaning of the) data using an Entity-Relationship app
roach. The ER model leads to a clean design of a relational database schema
for storing and retrieving the DNA and protein data extracted from various
sources. Our approach provides the clean basis for building complex biolog
ical applications that are more amenable to changes and software ports than
their file-base counterparts.