A RAPID ACCESS MOTIF DATABASE (RAMDB) WITH A SEARCH ALGORITHM FOR THERETRIEVAL PATTERNS IN NUCLEIC-ACIDS OR PROTEIN DATA-BANKS

Citation
C. Fondrat et P. Dessen, A RAPID ACCESS MOTIF DATABASE (RAMDB) WITH A SEARCH ALGORITHM FOR THERETRIEVAL PATTERNS IN NUCLEIC-ACIDS OR PROTEIN DATA-BANKS, Computer applications in the biosciences, 11(3), 1995, pp. 273-279
Citations number
25
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
11
Issue
3
Year of publication
1995
Pages
273 - 279
Database
ISI
SICI code
0266-7061(1995)11:3<273:ARAMD(>2.0.ZU;2-T
Abstract
We present here a codification structure, entirely interfaced with the main packages for biomolecule database management, associated with a new search algorithm to retrieve quickly a sequence in a database. Thi s sq stem is derived from a method previously proposed for homology se arch in databanks with a preprocessed codification of an entire databa se in which all the overlapping subsequences of a specific length in a sequence were converted into a code and stored in a hash-coding file. This new algorithm is designed far an improved use of the codificatio n. It is based on the recognition of the rarest strings which characte rize the query sequence and the inter section of sorted lists read in the codification structure. The system is applicable to both nucleic a cid and protein sequences and is used to find patterns in databanks or large sets of sequences. A few examples of applications are given. In addition, the comparison of our method with existing ones shows that this new approach speeds up the search for query patterns in large dat a sets.