Since the early 1990s a number of papers on "robust" digital watermarking s
ystems have been presented but none of them uses the same robustness criter
ia. This is not practical at ail for comparison and slows down progress in
this area. To address this issue, we present an evaluation procedure of ima
ge watermarking systems. First we identify all necessary parameters for pro
per benchmarking and investigate how to quantitatively describe the image d
egradation introduced by the watermarking process. For this, we show the we
aknesses of usual image quality measures in the context watermarking and pr
opose a novel measure adapted to the human visual system. Then we show how
to efficiently evaluate the watermark performance in such a way that fair c
omparisons between different methods are possible. The usefulness of three
graphs: "attack versus visual-quality," "bit error versus visual quality,"
and "bit error versus attack" are investigated. In addition the receiver op
erating characteristic graphs are reviewed and proposed to describe statist
ical detection behavior of watermarking methods. Finally we review a number
of attacks that any system should survive to be really useful and propose
a benchmark and a set of different suitable images, (C) 2000 SPIE and IS&T.
[S1017-9909(00)00604-8].