ITA
ENG

Retrieval and on-the-fly alignment of sequence fragments from the HIV database

Authors

Gaschen, B Kuiken, C Korber, B Foley, B

Citation

B. Gaschen et al., Retrieval and on-the-fly alignment of sequence fragments from the HIV database, BIOINFORMAT, 17(5), 2001, pp. 415-418

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

BIOINFORMATICS

ISSN journal

13674803 → ACNP

Volume

Issue

Year of publication

2001

Pages

415 - 418

Database

ISI

SICI code

1367-4803(200105)17:5<415:RAOAOS>2.0.ZU;2-Z

Abstract

Motivation: The amount of HIV-1 sequence data generated (presently around 4 2 000 sequences, of which more than 22 000 are from the V3 region of the vi ral envelope) presents a challenge for anyone working on the analysis of th ese data. A major problem is obtaining the region of interest from the stor ed sequences, which often contain but are not limited to that region. In ad dition, multiple alignment programs generally cannot deal with the large nu mbers of sequences that are available for many HIV-1 regions. We set out to provide our users with a tool that will retrieve and create an initial ali gnment of the HIV sequences that are available for a given genomic region. Results: The MPAlign (Multiple Pairwise Alignment) web interface is a colle ction of Perl scripts that retrieves sequences from the Los Alamos HIV sequ ence database based on a number of search parameters. All sequences were pa irwise-aligned to a model sequence using the Hidden Markov Model-based prog ram HMMER. The HMMER model is general enough to accommodate virtually all H IV-1 sequences stored in the database. To create a multiple sequence alignm ent, gaps were inserted into the sequences during retrieval, so that they a re aligned to one another. Retrieving and aligning the almost 560 gp120 seq uences (similar to 1500 nt) stored in the database is at least 1500 times f aster than a similar Clustal alignment.