Cb. Lawrence et Vv. Solovyev, ASSIGNMENT OF POSITION-SPECIFIC ERROR-PROBABILITY TO PRIMARY DNA-SEQUENCE DATA, Nucleic acids research, 22(7), 1994, pp. 1272-1280
DNA sequence predicted from polyacrylamide gel-based technologies is i
naccurate because of variations in the quality of the primary data due
to limitations of the technology, and to sequence-specific variations
due to nucleotide interactions within the DNA molecule and with the g
el. The ability to recognize the probability of error in the primary d
ata will be useful in reconstructing the target sequence of a DNA sequ
encing project, and in estimating the accuracy of the final sequence.
This paper describes the use of linear discriminant analysis to assign
position-specific probabilities of incorrect, over- and under- predic
tion of nucleotides for each predicted nucleotide position in primary
sequence data generated by a gel-based DNA sequencing technology. Usin
g this method, most of the error potential in primary sequence data ca
n be assigned to a limited number of discrete positions. The use of pr
obability values in the sequence reconstruction process, and in estima
ting the accuracy of consensus sequence determination is described.