We discuss some operational issues pertaining to the detection of duplicate
s in the databases of bitmapped binary document images, and reason that eff
icient and effective duplicate document detection probably needs a combinat
ion of an efficient primary detector and an effective subordinate detector
to be achieved. An algorithm that executes binary pattern template matching
by cross-correlation is proposed as a duplicate document detection methodo
logy. The template matching operation is amenable to pixel-parallel computa
tion on serial architecture computers by bitwise integer operations. A desc
ription of the algorithm is accompanied by a discussion of issues that aris
e in its practical implementation. Duplicate detection by template matching
is especially well suited to facsimile (i.e. fax) databases, in particular
for detecting the single feed-multiple transmissions that often dominate t
he occurrence of duplicates in fax databases. Detailed experimental results
presented for fax documents demonstrate that template matching is suitable
as both a primary detector when conducted with small template and search a
rea sizes, and a subordinate detector when conducted with moderate template
and search area sizes. (C) 2000 Elsevier Science B.V. All rights reserved.