Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining

Citation
Rd. King et al., Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining, YEAST, 17(4), 2000, pp. 283-293
Citations number
51
Categorie Soggetti
Biotecnology & Applied Microbiology",Microbiology
Journal title
YEAST
ISSN journal
0749503X → ACNP
Volume
17
Issue
4
Year of publication
2000
Pages
283 - 293
Database
ISI
SICI code
0749-503X(200012)17:4<283:APOPFC>2.0.ZU;2-2
Abstract
The analysis of genomics data needs to become as automated as its generatio n. Here we present a novel data-mining approach to predicting protein funct ional class from sequence. This method is based on a combination of inducti ve logic programming clustering and rule learning. We demonstrate the effec tiveness of this approach on the M, tuberculosis and E. coli genomes, and i dentify biologically interpretable rules which predict protein functional c lass from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M, tuberculosis and 24% of th ose in E, coli, with an estimated accuracy of 60-80% (depending on the leve l of functional assignment). The rules are founded on a combination of dete ction of remote homology, convergent evolution and horizontal gene transfer . We identify rules that predict protein functional class even in the absen ce of detectable sequence or structural homology, These rules give insight into the evolutionary history of M. tuberculosis and E, coli, Copyright (C) 2000 John Wiley & Sons, Ltd.