ITA
ENG

EVALUATING THE EFFECTIVENESS OF SEQUENCE-ANALYSIS ALGORITHMS USING MEASURES OF RELEVANT INFORMATION

Authors

WOOTTON JC

Citation

Jc. Wootton, EVALUATING THE EFFECTIVENESS OF SEQUENCE-ANALYSIS ALGORITHMS USING MEASURES OF RELEVANT INFORMATION, Computers & chemistry, 21(4), 1997, pp. 191-202

Citations number

Journal title

Computers & chemistry → ACNP

ISSN journal

00978485

Volume

Issue

Year of publication

1997

Pages

191 - 202

Database

ISI

SICI code

0097-8485(1997)21:4<191:ETEOSA>2.0.ZU;2-5

Abstract

Given vast quantities of molecular sequence data, and numerous differe nt algorithms designed to discover, diagnose or model biologically int eresting features in sequences, how is it possible to make objective e valuations of the diagnostic effectiveness of these algorithms and rob ust assessments of their relative strengths and limitations? An approa ch to this relatively neglected question is developed here, which is b ased on information measures of the diagnostic efficiency of different methods. From output lists of a procedure such as a database search, ''relevance weights'' are assigned that encode, for each sequence list ed, the level of associated scientific evidence implicating that seque nce as an example of a feature of interest. Relevance weights may be d erived, following systematic protocols, from expert human judgement or , in principle, by automated information retrieval from electronic res ources. Practical applications of this approach to algorithm assessmen t and development and parameter choice are demonstrated with examples of automated sequence motif modeling for the DNA-binding helix-turn-he lix motif and the guanine exchange factor protein domain. The combined use of relevance weights and information measures appears to offer pr omising advantages over ROC analysis and may be generally applicable t o diagnostic evaluation. Published by Elsevier Science Ltd.