SAGE data are obtained by sequencing short DNA tags. Due to the mistakes in
DNA sequencing, SAGE data contain errors. We propose a new approach to ide
ntify tags whose abundance is biased by sequencing errors. This approach is
based on a concept of neighbourhood: abundant tags can contaminate tags wh
ose sequence is very close. The application of our approach reveals that mo
derately abundant tags can be generated by sequencing errors uniquely. It a
lso allows for detecting correct rare tags.