High throughput genome (HTG) and expressed sequence tag (EST) sequences are
currently the most abundant nucleotide sequence classes in the public data
base. The large volume, high degree of fragmentation and lack of gene struc
ture annotations prevent efficient and effective searches of HTG and EST da
ta for protein sequence homologies by standard search methods. Here, we bri
efly describe three newly developed resources that should make discovery of
interesting genes in these sequence classes easier in the future, especial
ly to biologists not having access to a powerful local bioinformatics envir
onment. trEST and trGEN are regularly regenerated databases of hypothetical
protein sequences predicted from EST and HTG sequences, respectively. Hits
is a web-based data retrieval and analysis system providing access to prec
omputed matches between protein sequences (including sequences from trEST a
nd trGEN) and patterns and profiles from Prosite and Pfam. The three resour
ces can be accessed via the Hits home page (http://hits.isb-sib.ch).