A COMPRESSION MECHANISM FOR SEQUENCE DATABASES TO IMPROVE THE EFFICIENCY OF CONVENTIONAL TOOLS

Citation
R. Doelz et F. Eggenberger, A COMPRESSION MECHANISM FOR SEQUENCE DATABASES TO IMPROVE THE EFFICIENCY OF CONVENTIONAL TOOLS, Computer applications in the biosciences, 11(2), 1995, pp. 219-223
Citations number
6
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
11
Issue
2
Year of publication
1995
Pages
219 - 223
Database
ISI
SICI code
0266-7061(1995)11:2<219:ACMFSD>2.0.ZU;2-2
Abstract
This paper describes a method to compress molecular biology databases that are characterized by an increasing proportion of data derived fro m genome projects. The performance of our tool has been tested on vari ous data files of the EMBL nucleotide sequence database. The best comp ression ratios were achieved on EST (Expressed Sequence Tags) data, ty pically derived from large-scale sequence projects. The compression of sequence database updates was tested in combination with the common U nix compression program 'compress'. Our tool improved the efficiency o f 'compress' on average by 16%.