Jc. Wootton, EVALUATING THE EFFECTIVENESS OF SEQUENCE-ANALYSIS ALGORITHMS USING MEASURES OF RELEVANT INFORMATION, Computers & chemistry, 21(4), 1997, pp. 191-202
Given vast quantities of molecular sequence data, and numerous differe
nt algorithms designed to discover, diagnose or model biologically int
eresting features in sequences, how is it possible to make objective e
valuations of the diagnostic effectiveness of these algorithms and rob
ust assessments of their relative strengths and limitations? An approa
ch to this relatively neglected question is developed here, which is b
ased on information measures of the diagnostic efficiency of different
methods. From output lists of a procedure such as a database search,
''relevance weights'' are assigned that encode, for each sequence list
ed, the level of associated scientific evidence implicating that seque
nce as an example of a feature of interest. Relevance weights may be d
erived, following systematic protocols, from expert human judgement or
, in principle, by automated information retrieval from electronic res
ources. Practical applications of this approach to algorithm assessmen
t and development and parameter choice are demonstrated with examples
of automated sequence motif modeling for the DNA-binding helix-turn-he
lix motif and the guanine exchange factor protein domain. The combined
use of relevance weights and information measures appears to offer pr
omising advantages over ROC analysis and may be generally applicable t
o diagnostic evaluation. Published by Elsevier Science Ltd.