ITA
ENG

COMPUTATIONAL SPACE REDUCTION AND PARALLELIZATION OF A NEW CLUSTERINGAPPROACH FOR LARGE GROUPS OF SEQUENCES

Authors

TRELLES O ANDRADE MA VALENCIA A ZAPATA EL CARAZO JM

Citation

O. Trelles et al., COMPUTATIONAL SPACE REDUCTION AND PARALLELIZATION OF A NEW CLUSTERINGAPPROACH FOR LARGE GROUPS OF SEQUENCES, BIOINFORMATICS, 14(5), 1998, pp. 439-451

Citations number

Categorie Soggetti

Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods

Journal title

BIOINFORMATICS → ACNP

ISSN journal

13674803

Volume

Issue

Year of publication

1998

Pages

439 - 451

Database

ISI

SICI code

1367-4803(1998)14:5<439:CSRAPO>2.0.ZU;2-U

Abstract

Motivation: The explosive growth of the biological sequences databases stimulated by genome projects has modified the framework of several a pplications in the biological sequence analysis area. In most cases, t his new scenario is characterized by studies on lai ge sets of sequenc es, suggesting the need for effective and automatic methods for their clustering. A more effective clustering of the database could be follo wed by the application of common family analysis schemes to the groups so formed. Results: In this work, we present a new strategy to reduce the computational cost associated with the clustering of large sets o f sequences which are expected to contain several families. The strate gy is based on the grouping of the sequences into families by using a dynamic threshold on a pairwise sequence similarity criterion. Routine clustering of large data sets can now be clone very efficiently. The method developed here achieves a computational space reduction of abou t an order of magnitude over more traditional ones of all-versus-all c omparisons. The outcome of this approach produces family groupings tha t reproduce closely already accepted biological results. Our work incl udes a par-allel implementation for distributed memory multiprocessors with a dynamic scheduling strategy for performance optimization.