G. Perriere et J. Thioulouse, ONLINE TOOLS FOR SEQUENCE RETRIEVAL AND MULTIVARIATE-STATISTICS IN MOLECULAR-BIOLOGY, Computer applications in the biosciences, 12(1), 1996, pp. 63-69
We have developed a World-Wide-Web server for browsing sequence collec
tions structured under the ACNUC format and for performing multivariat
e analyses on sequences. General collections (like GenBank or EMBL), a
s well as specialized data banks (like Hovergen and NRSub) can be acce
ssed. This system allows complex queries to be constructed, and the re
sult of each query, represented by a list of sequences is stored on th
e server. It is then possible to reuse this list to compute multivaria
te analyses on the sequences. Two examples of applications are shown.
The first one consists in a study of codon usage with correspondence a
nalysis on all the protein genes of Haemophilus influenzae Rd. This st
udy allows the highly expressed genes and the integral membrane protei
ns of this organism to be identified. The second one consists in an or
dering of 70 aligned protein sequences of growth hormone with principa
l coordinate analysis. With this method, we are able to re-establish t
he patterns of relationships between the sequences previously determin
ed with tree building programs.