J. Kleffe et al., DNASTAT - A PASCAL UNIT FOR THE STATISTICAL-ANALYSIS OF DNA AND PROTEIN SEQUENCES, Computer applications in the biosciences, 11(4), 1995, pp. 449-455
DNASTAT is a collection of Pascal routines for researchers who develop
their own application programs for statistical analysis of DNA and pr
otein sequences. Dynamic and file-based data structures allow users to
process sets of sequences by simple loop control without limitations
on the number of sequences and their individual sizes. This frees the
programmer from potentially error-prone tasks like dynamic memory allo
cation and controlling array sizes. Sequences can be stored in databas
es along with biological and statistical attributes. Individual sequen
ces can be accessed by column name and row number as with spreadsheets
. DNASTAT allows large sets of sequences to be processed using a PC wi
th standard configuration. Its small size, simplicity and free availab
ility make it attractive to students of mathematical biology. Use of D
NASTAT is illustrated by two sample programs that generate a database
of coding regions from the GenBank entry of the tobacco chloroplast ge
nome. A version of DNASTAT written in ANSI-C for PCs and Unix workstat
ions is also available.