K. Takagi et S. Itahashi, TEMPORAL CHARACTERISTICS OF UTTERANCE UNITS AND TOPIC STRUCTURE OF SPOKEN DIALOGS, IEICE transactions on information and systems, E78D(3), 1995, pp. 269-276
There are various difficulties in processing spoken dialogs because of
acoustic, phonetic, and grammatical ill-formedness, and because of in
teractions among participants. This paper describes temporal character
istics of utterances in human-human task-oriented dialogs and interact
ions between the participants, analyzed in relation to the topic struc
ture of the dialog. We analyzed 12 task-oriented simulated dialogs of
ASJ continuous speech corpus conducted by 13 different participants wh
ose total length being 66 minutes. Speech data was segmented into utte
rance units each of which is a speech interval segmented by pauses. Th
ere were 3876 utterance units, and 38.9% of them were interjections, f
illers, false starts and chiming utterances. Each dialog consisted of
6 to 15 topic segments in each of which participants exchange specific
information of the task. Eighty-six out of 119 new topic segments sta
rted with interjectory utterances and filled pauses. It was found that
the durations of turn-taking interjections and fillers including the
preceding silent pause were significantly longer in topic boundaries t
han the other positions. The results indicate that the duration of int
erjection words and filled pauses is a sign of a topic shift in spoken
dialogs. In natural conversations, participants' speaking modes chang
e dynamically as the conversation develops. Response time of both clie
nt and agent role speakers became shorter as the dialog proceeded. Thi
s indicates that interactions between the participants become active a
s the dialog proceeds. Speech rate was also affected by the dialog str
ucture. initiating and terminating parts where most utterances are of
fixed expressions, and slow in topic segments of the body part of the
dialog where both client and agent participants stalled to speak in or
der to retrieve task knowledge. The results can be utilized in man-mac
hine dialog systems, e.g., in order to detect topic shifts of a dialog
, and to make the speech interface of dialog systems more natural to a
human participant.