This paper deals with three different problems in the processing of bi
nary images of handwritten text documents. Firstly, an integrated algo
rithm that finds a straight line approximation of a textual stroke is
described. It has the advantage of using the distance transform of thi
nned binary images to identify spurious bifurcation points, which are
unavoidable when thinning algorithms are used, remove them and recover
the original ones. The obtained straight line approximations preserve
the structural information of the original pattern. The algorithm doe
s not resort to distortable geometrical properties. Secondly, a method
is presented to recover]oops that become blobs due to blotting. The m
ethod depends on removing the pixels whose distance transform exceeds
a calculated threshold. Unfortunately, it seems that it is not possibl
e to recover such loops with a high rate of success. The authors sugge
st that the inclusion of thickness information, in the line segments t
hat connect the vertices of the straight line approximations produced
by the previous algorithm, is a step towards a solution of this proble
m. Finally, a method is developed to extract lines from pages of handw
ritten text, by finding the shortest spanning tree of a graph Formed f
rom the set of main strokes. Then, main strokes of extracted lines are
arranged in the same order as they were written by following the path
in which they are contained. Then, every secondary stroke is assigned
to the closest main stroke. At the end, an ordered list of main strok
es, each with the corresponding number of assigned secondary strokes,
is obtained. Each combination of main-secondary strokes can be the inp
ut to a subsequent recognition stage. The method proved to be powerful
and more suited to variable handwriting. Copyright (C) 1996 Pattern R
ecognition Society.