CLEANUP - A FAST COMPUTER-PROGRAM FOR REMOVING REDUNDANCIES FROM NUCLEOTIDE-SEQUENCE DATABASES

Citation
G. Grillo et al., CLEANUP - A FAST COMPUTER-PROGRAM FOR REMOVING REDUNDANCIES FROM NUCLEOTIDE-SEQUENCE DATABASES, Computer applications in the biosciences, 12(1), 1996, pp. 1-8
Citations number
10
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
1
Year of publication
1996
Pages
1 - 8
Database
ISI
SICI code
0266-7061(1996)12:1<1:C-AFCF>2.0.ZU;2-Y
Abstract
A key concept in comparing sequence collections is the issue of redund ancy. The production of sequence collections free from redundancy is u ndoubtedly very useful, both in performing statistical analyses and ac celerating extensive database searching on nucleotide sequences. Indee d, publicly available databases contain multiple entries of identical or almost identical sequences. Performing statistical analysis on such biased data makes the risk of assigning high significance to non-sign ificant patterns very high. In order to carry out unbiased statistical analysis as well as more efficient database searching it is thus nece ssar), to analyse sequence data that have been purged of redundancy. G iven that a unambiguous definition of redundancy is impracticable for biological sequence data, in the present program a quantitative descri ption of redundancy will be used, based on the measure of sequence sim ilarity. A sequence is considered redundant if it shows a degree of si milarity and overlapping with a longer sequence in the database greate r than a threshold fixed by the user. In this paper we present a new a lgorithm based on an approximate string matching' procedure, which is able to determine the overall degree of similarity between each pair o f sequences contained in a nucleotide sequence database and to generat e automatically nucleotide sequence collections free from redundancies .