The overall quality of three Text-To-Speech (TTS) synthesis systems fo
r Italian with common prosodic control but different diphones and synt
hesizers was evaluated by means of the combined application of Mean Op
inion Score and Pair Comparison methods. Direct comparison between the
two methods serves to validate MOS, which is the the technique recomm
ended by CCITT for synthesis evaluation. In the MOS experiment, assess
ment also included three types of natural speech (normal and degraded)
as reference. Eighteen subjects expressed 2880 MOS judgements and mad
e 720 comparisons in all. The results obtained from the two methods sh
owed good agreement. The most important MOS voice parameters used by l
isteners for differentiating the systems were Global Impression, Voice
, Articulation and Pronunciation. The diphones appeared to contribute
most to the different judgements, whereas synthesizers were not percei
ved as different by listeners. This experiment provides positive verif
ication of interlaboratory reproducibility of MOS, which proved to be
an effective technique for overall assessment of TTS quality.