Aa. Mironov et al., COMPRESSED DATA FORMAT FOR BIOPOLYMER PRIMARY AND SPATIAL STRUCTURE -DATA-RETRIEVAL TOOLS FOR COMPRESSED DATA-BANKS, Molecular biology, 28(1), 1994, pp. 127-132
An open CAN format (Compressed Aminoacids and Nucleotides) is presente
d for storing genetic information in compressed form in data banks (DB
). The data compression principles are considered in detail with EMBL
(nucleotide sequences, SWISSPROT (amino acid sequences), and PDB (3D s
tructures) as examples. A unified compressed data format permits integ
ration of EMBL, SWISSPROT, and PDB into a single DB. This approach is
intended to be applied for integrating GENBANK and other analogous DBs
. Another outcome of the work is a library of DB access and retrieval
procedures providing the composers of applied software with a uniform
interface to biologically related DBs. The proposed data storage schem
e was recommended by the Expert Commission of the Informatics Section
of the Human Genome State program as a standard for DB distribution in
Russia.