R. Doelz et F. Eggenberger, A COMPRESSION MECHANISM FOR SEQUENCE DATABASES TO IMPROVE THE EFFICIENCY OF CONVENTIONAL TOOLS, Computer applications in the biosciences, 11(2), 1995, pp. 219-223
This paper describes a method to compress molecular biology databases
that are characterized by an increasing proportion of data derived fro
m genome projects. The performance of our tool has been tested on vari
ous data files of the EMBL nucleotide sequence database. The best comp
ression ratios were achieved on EST (Expressed Sequence Tags) data, ty
pically derived from large-scale sequence projects. The compression of
sequence database updates was tested in combination with the common U
nix compression program 'compress'. Our tool improved the efficiency o
f 'compress' on average by 16%.