FISH (Fast Index Search for Homologous coding sequences) consists of a
database and associated software and is intended to function as a dir
ectory of protein-coding gene sequences. The FISH index contains descr
iptions of 22 361 DNA sequences from release 69.0 of the GenBank genet
ic sequence database. Complete coding sequences are represented numeri
cally with counts of nucleotides and synonymous codons, and with GenBa
nk LOCUS names and short descriptions. The software permits the databa
se to be queried by GenBank LOCUS name, sequence length (expressed as
total number of codons), or by comparison with a DNA sequence. In the
latter case, the numerical descriptions are compared with simple dista
nce measures in place of actual DNA sequences. The FISH package can be
used to rapidly assemble lists of similar coding sequences, without r
egard to functional annotation or sequence alignments. Typical search
times are well under a minute on widely available IBM-compatible micro