O. Trelles et al., COMPUTATIONAL SPACE REDUCTION AND PARALLELIZATION OF A NEW CLUSTERINGAPPROACH FOR LARGE GROUPS OF SEQUENCES, BIOINFORMATICS, 14(5), 1998, pp. 439-451
Citations number
14
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Motivation: The explosive growth of the biological sequences databases
stimulated by genome projects has modified the framework of several a
pplications in the biological sequence analysis area. In most cases, t
his new scenario is characterized by studies on lai ge sets of sequenc
es, suggesting the need for effective and automatic methods for their
clustering. A more effective clustering of the database could be follo
wed by the application of common family analysis schemes to the groups
so formed. Results: In this work, we present a new strategy to reduce
the computational cost associated with the clustering of large sets o
f sequences which are expected to contain several families. The strate
gy is based on the grouping of the sequences into families by using a
dynamic threshold on a pairwise sequence similarity criterion. Routine
clustering of large data sets can now be clone very efficiently. The
method developed here achieves a computational space reduction of abou
t an order of magnitude over more traditional ones of all-versus-all c
omparisons. The outcome of this approach produces family groupings tha
t reproduce closely already accepted biological results. Our work incl
udes a par-allel implementation for distributed memory multiprocessors
with a dynamic scheduling strategy for performance optimization.