The current generation of continuous speech recognition systems claims to o
ffer high accuracy (greater than 95 percent) speech recognition at natural
speech rates (150 words per minute) on low-cost (under $2000) platforms. Th
is paper presents a state-of-the-technology summary, along with insights th
e authors have gained through testing one such product extensively and othe
r products superficially.
The authors have identified a number of issues that are important in managi
ng accuracy and usability. First, for efficient recognition users must star
t with a dictionary containing the phonetic spellings of all words they ant
icipate using. The authors dictated 50 discharge summaries using one inexpe
nsive internal medicine dictionary ($30) and found that they needed to add
an additional 400 terms to get recognition rates of 98 percent. However if
they used either of two more expensive and extensive commercial medical voc
abularies ($349 and $695), they did not need to add terms to get a 98 perce
nt recognition rate. Second, users must speak clearly and continuously, dis
tinctly pronouncing all syllables. Users must also correct errors as they o
ccur, because accuracy improves with error correction by at least 5 percent
over two weeks. Users may find it difficult to train the system to recogni
ze certain teres, regardless of the amount of training, and appropriate sub
stitutions must be created. For example, the authors had to substitute "twi
ce a day" for "bid" when using the less expensive dictionary, but not when
using the other two dictionaries. From trials they conducted in settings ra
nging from an emergency room to hospital wards and clinicians' offices, the
y learned that ambient noise has minimal effect. Finally, they found that a
minimal "usable" hardware configuration (which keeps up with dictation) co
mprises a 300-MHz Pentium processor with 128 MB of RAM and a "speech qualit
y" sound card (e.g., SoundBlaster, $99). Anything less powerful will result
in the system lagging behind the speaking rate.
The authors obtained 97 percent accuracy with just 30 minutes of training w
hen using the latest edition of one of the speech recognition systems suppl
emented by a commercial medical dictionary. This technology has advanced co
nsiderably in recent years and is now a serious contender to replace some o
r all of the increasingly expensive alternative methods of dictation with h
uman transcription.