C. Fondrat et P. Dessen, A RAPID ACCESS MOTIF DATABASE (RAMDB) WITH A SEARCH ALGORITHM FOR THERETRIEVAL PATTERNS IN NUCLEIC-ACIDS OR PROTEIN DATA-BANKS, Computer applications in the biosciences, 11(3), 1995, pp. 273-279
We present here a codification structure, entirely interfaced with the
main packages for biomolecule database management, associated with a
new search algorithm to retrieve quickly a sequence in a database. Thi
s sq stem is derived from a method previously proposed for homology se
arch in databanks with a preprocessed codification of an entire databa
se in which all the overlapping subsequences of a specific length in a
sequence were converted into a code and stored in a hash-coding file.
This new algorithm is designed far an improved use of the codificatio
n. It is based on the recognition of the rarest strings which characte
rize the query sequence and the inter section of sorted lists read in
the codification structure. The system is applicable to both nucleic a
cid and protein sequences and is used to find patterns in databanks or
large sets of sequences. A few examples of applications are given. In
addition, the comparison of our method with existing ones shows that
this new approach speeds up the search for query patterns in large dat
a sets.