Cf. Allex et al., Neural network input representations that produce accurate consensus sequences from DNA fragment assemblies, BIOINFORMAT, 15(9), 1999, pp. 723-728
Motivation: Given inputs extracted from an aligned column of DNA bases and
the underlying Pet-kin Elmer Applied Biosystems (ABI) fluorescent tr-aces,
our goal is to train a neural network to determine correctly the consensus
base for the column. Choosing an appropriate network input representation i
s critical to success in this task. We empirically compare five representat
ions; one uses only base calls and the others include trace information.
Results: We attained the most accurate results from networks that incorpora
te trace information into their input representations. Based on estimates d
erived from using 10-fold cross-validation, the best network topology produ
ces consensus accuracies ranging from 99.26% to >99.98% for coverages from
two to six aligned sequences. With a coverage of six, it makes only three e
rrors in 20 000 consensus calls. In contrast, the network that only uses ba
se calls in its input representation has over double that error rate: eight
errors in 20 000 consensus calls.