Pronunciation by analogy (PbA) is an emerging, data-driven technique w
ith potential application in text-to-speech (TTS) systems, as well as
being an influential psychological model of reading aloud. The underly
ing idea is that a pronunciation for an unknown word (i.e., one not in
the dictionary, or lexicon, of the human or machine ''reader'') is as
sembled by matching substrings of the input to substrings of known, le
xical words, hypothesizing a partial pronunciation for each matched su
bstring from the lexical knowledge of the ''reader,'' and concatenatin
g the partial pronunciations. This paper assesses the capability of Pb
A to derive pronunciations for unknown words of English. As a psycholo
gical model, PbA is ''under-specified,'' that is, the implementor of a
simulation of the process faces detailed choices which can only be re
solved by trial and error. One goal for this paper is to explore the i
mpact of certain basic implementational choices on the performance of
PbA systems. The variables studied are the specific lexical database u
sed as the basis of the analogy process, the way of ranking/scoring ca
ndidate pronunciations, and the effect of manual versus automatic alig
nment of letters and phonemes. When tested with short (monosyllabic) p
seudowords previously used in experimental psychology studies, the low
est error rate achieved is 14.3% (for a test set of size 70). We concl
ude that current PbA systems are at best poor models of pseudoword pro
nunciation by humans. To assess their suitability for use in a TTS app
lication, in which multisyllabic words will be encountered, the implem
entations have also been tested with lexical words temporarily removed
from the dictionary. The best performance obtained was 93.5% phonemes
correct (corresponding to 67.9% words correct) for a 16, 280-word dic
tionary. This is vastly superior to the 25.7% words correct obtained u
sing a set of popular letter-to-sound rules, indicating considerable s
cope for analogy methods to be exploited in future TTS systems.