COMPUTATIONAL SPACE REDUCTION AND PARALLELIZATION OF A NEW CLUSTERINGAPPROACH FOR LARGE GROUPS OF SEQUENCES

Citation
O. Trelles et al., COMPUTATIONAL SPACE REDUCTION AND PARALLELIZATION OF A NEW CLUSTERINGAPPROACH FOR LARGE GROUPS OF SEQUENCES, BIOINFORMATICS, 14(5), 1998, pp. 439-451
Citations number
14
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Journal title
ISSN journal
13674803
Volume
14
Issue
5
Year of publication
1998
Pages
439 - 451
Database
ISI
SICI code
1367-4803(1998)14:5<439:CSRAPO>2.0.ZU;2-U
Abstract
Motivation: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several a pplications in the biological sequence analysis area. In most cases, t his new scenario is characterized by studies on lai ge sets of sequenc es, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be follo wed by the application of common family analysis schemes to the groups so formed. Results: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets o f sequences which are expected to contain several families. The strate gy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be clone very efficiently. The method developed here achieves a computational space reduction of abou t an order of magnitude over more traditional ones of all-versus-all c omparisons. The outcome of this approach produces family groupings tha t reproduce closely already accepted biological results. Our work incl udes a par-allel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization.