Progress in year 1 of the project: Overview
The CLASSiC Architecture
At the centre of the CLASSIC project is the CLASSIC architecture for SDS and its end-to-end statistical treatment of uncertainty. It provides a uniﬁed statistical model of both the sources of uncertainty (speech recognition, language understanding, dialogue strategies) and the constraints on uncertainty (task context, dialogue context, user context), thereby allowing multiple possible analyses to be represented, maintained, reasoned with, and later disambiguated in a robust, efficient and natural way. State generalization techniques will also be explored to allow previously unseen situations to be dealt with robustly. At the same time, the CLASSIC architecture maintains the modularity of traditional SDS, allowing the separate development of statistical models of speech recognition, spoken language understanding, dialogue management, natural language generation and speech synthesis.
Work performed since the project start
In the first year of the project, progress has been made in 5 main areas:
Learning approaches to Dialogue Management (DM):
- using Partially Observable Markov Decision Processes (POMDPs)
integrating computational learning with commercial SDS tools.
Statistical approaches to Spoken Language Understanding (SLU).
Developing simulated users for training Dialogue Management and Natural Language Generation strategies.
A new statistical learning approach to Natural Language Generation (NLG) in SDS.
Design of the CLASSiC architecture, and implementation of the first CLASSiC prototype systems:
a “Town Information” system
an internet connection “Self-Help” system.
Main results to date
As planned, we have
developed our initial version of the CLASSiC architecture and we have
integrated the first version of CLASSiC system 1. This is an
end-to-end statistical system, using computational learning methods
and components from the project partners (Geneva’s SLU, Cambridge’s
DM, Edinburgh’s NLG, and France Telecom’s speech synthesis),
working in the Town Information domain in English. The system is
ready for demonstration and testing is planned for the start of
project year 2.
Figure 1 above shows a screen shot of the initial CLASSiC Town Information Spoken Dialogue System. (Note that this image shows the developer interface for the system – the end user will interact with the system by voice alone.) A video of this system is available.
Here the user has said “Are there any bars playing Jazz?” (the top left window shows the many speech recognition hypotheses that the system monitors, representing what the user may have said, and the lower left box shows the multiple hypotheses generated by the Speech Understanding component), and the system uses the hypotheses generated by this input to update its belief state (see middle window) regarding the probable user goals. Here the top (most probable) goal is that the user is looking for a jazz bar in the north of town (the user’s preference for “north” has been detected earlier in the conversation). The red text in the bottom right window shows the Dialogue Move planned by the system in response (unfortunately there is no such bar), and the top right window displays the natural language text generated by the system which is then sent to the speech synthesize
An initial version of the CLASSiC Self-Help internet connection system has also been developed in French. This system is based on an evolution of the France Telecom industrial platform.
The platform and its associated design tools were extended to enable the design of dialogue alternatives, and this enables the system to take different dialogue decisions at runtime, while providing feedback. This has led to CLASSiC system 3.
The service designed for CLASSiC System 3 has been built to guide the user with the installation of their DSL modem. Figure 2 below shows a snapshot of the Dialogue Design Studio (DDS) tool being used to construct part of the dialogue application. In the figure, one can see from left to right the initialisation of two global state variables: "score" and "choix". Score is used to store the dialogue reward, and "choix" is used to select different dialogue alternatives. Then one can see a conditional branching dependent on the "choix" variable, which leads the system into one of the three branches. In this example, each branch differs only by the voice played to the user: the first message is played with a neutral speaking style, while the second message is more formal, and the third message more friendly. Finally, all the branches are merged into the same dialogue path.
As well as the overall architecture and the initial systems described above, significant research progress has been made in the individual components making up these systems: the project has published 2 journal papers (with 4 more accepted to appear), and 15 conference or workshop papers (with 5 more accepted for publication in 2009). This strong publications record demonstrates the international impact of the project results in the first year.