J. Bingham et al., Informatics issues in large-scale sequence analysis: Elucidating the protein kinases of C-elegans, J CELL BIOC, 80(2), 2000, pp. 181-186
With the availability of the nearly complete genomic sequence of C, elegans
, the first multicellular organism to be sequenced, molecular biology has d
efinitely entered the postgenomic era. Annotation of the genomic sequence,
which refers to identifying the genes and other biologically relevant secti
ons of the genome, is an important and nontrivial next step. A first-pass a
nnotation will be necessarily incomplete but will drive further biological
experiments, which in turn will help to annotate the genome better. Given t
he scale of the genome sequence analysis, it is clear that the annotation s
hould be automated as much as possible without sacrificing the quality of a
nalysis. In this work, we outline our approach to identifying the protein k
inases of C. elegans from the genomic sequence. We describe new tools we ha
ve developed for analysis, management and visualization of genomic data. By
developing modular and scalable solutions, this study has provided a frame
work for future analysis of the Drosophila and human genomes. J. Cell. Bioc
hem. 80:181-186, 2000. (C) 2000 Wiley-Liss, Inc.