Vb. Strelets et al., DATA-BANK HOMOLOGY SEARCH ALGORITHM WITH LINEAR COMPUTATION COMPLEXITY, Computer applications in the biosciences, 10(3), 1994, pp. 319-322
A new algorithm for data bank homology search is proposed. The princip
al advantages of the new algorithm are: (i) linear computation complex
ity, (ii) law memory requirements; and (iii) high sensitivity to the p
resence of local legion homology. The algorithm first calculates indic
ative matrices of k-tuple 'realization' in the query sequence and then
searches for an appropriate number of matching k-tuples within a narr
ow range in database sequences. It does not require k-tuple coordinate
s fabulation and in-memory placement for database sequences. The algor
ithm is implemented in a program for execution on PC-compatible comput
ers and tested on PIR and GenBank databases with good results. A few m
odifications designed to improve the selectivity are also discussed. A
s an application example, the search for homology of the mouse homeoti
c protein HOX 3.1 is given.