EVALUATING THE EFFECTIVENESS OF SEQUENCE-ANALYSIS ALGORITHMS USING MEASURES OF RELEVANT INFORMATION

Authors
Citation
Jc. Wootton, EVALUATING THE EFFECTIVENESS OF SEQUENCE-ANALYSIS ALGORITHMS USING MEASURES OF RELEVANT INFORMATION, Computers & chemistry, 21(4), 1997, pp. 191-202
Citations number
39
Journal title
ISSN journal
00978485
Volume
21
Issue
4
Year of publication
1997
Pages
191 - 202
Database
ISI
SICI code
0097-8485(1997)21:4<191:ETEOSA>2.0.ZU;2-5
Abstract
Given vast quantities of molecular sequence data, and numerous differe nt algorithms designed to discover, diagnose or model biologically int eresting features in sequences, how is it possible to make objective e valuations of the diagnostic effectiveness of these algorithms and rob ust assessments of their relative strengths and limitations? An approa ch to this relatively neglected question is developed here, which is b ased on information measures of the diagnostic efficiency of different methods. From output lists of a procedure such as a database search, ''relevance weights'' are assigned that encode, for each sequence list ed, the level of associated scientific evidence implicating that seque nce as an example of a feature of interest. Relevance weights may be d erived, following systematic protocols, from expert human judgement or , in principle, by automated information retrieval from electronic res ources. Practical applications of this approach to algorithm assessmen t and development and parameter choice are demonstrated with examples of automated sequence motif modeling for the DNA-binding helix-turn-he lix motif and the guanine exchange factor protein domain. The combined use of relevance weights and information measures appears to offer pr omising advantages over ROC analysis and may be generally applicable t o diagnostic evaluation. Published by Elsevier Science Ltd.