Tables 2 and 3 present the results of these experiments. They are based on word recognition and concept recognition performance, respectively. The columns titled “Correct” and “Accuracy” refer to word correct rate and word accuracy, as well as to their concept-level equivalents. The “Sentence” column lists the percentage of completely correctly decoded sentences. For the twostage approach, the numbers in Table 2 denote the performance of the speech recognition system alone (step (3) in Fig. 1). For the one-stage approach, the semantic labels were removed after decoding in order to obtain the plain word sequences.
It can be seen that the word-based recognition benefits both from word-based additions to the language model, as well as from semantic labels inTable 3 summarizes the concept-level results. Here, the semantic labels are also compared against the reference. Numbers in sub-labels are ignored, however. The “NLU” row denotes the performance on perfectly recognized data, i.e. on the training transcriptions. One-stage integrated recognition produces competitive recognition rates when compared to the two-stage approach. Even though in the two-stage approach, each stage’s representation can be fine-tuned separately. It seems interesting to note a subtle difference between the decoding procedures of the two-stage and the one-stage architectures. In a stand-alone stochastic parser, Viterbi decoding is used for word-to-label correspondences. The probability of a transition from semantic state si to sj is thus defined as the product P(wj |sj )P(sj |si), where P(wj |sj ) is the probability of observing wj in state sj . In contrast, if a labelled language model is used the transition probability is P(wj |wi), where wi and wj are pairs of the actual words and their associated labels, so the surface form of the last word influences the transition as well (not only its label). 6 Conclusions and Future Work It can be shown that a flat HMM-based semantic analysis does not require a separate decoding stage. Instead it seems possible to use the speech recogniser’s language model to represent the semantic state model, without compromising recognition in terms of word or slot error rate.
For a stand-alone speech recognition component, it seems advantageousto use a class-based or context-based language model, since it improves the word recognition score. For the stochastic parsing, numbered sub-labels provide best results. With N-best decoding, the stochastic parser can be used to select the best overall hypothesis.
A number of improvements and extensions may be considered for the different processing stages. Firstly, instead of representing compound airport and city namessuch as “New York City” as word sequences, they could be entered in the dictionary as single words, which should avoid certain recognition errors. In addition, an equivalent of a class-based language model should be defined for semantically annotated language models. Also, contextual observations, i.e. the use of a class of manually defined context words could help the stochastic parser to address long-term dependencies that have so far proved difficult. Finally, the ATIS task results in relatively simple semantic structures and yields a limited vocabulary size. It would be interesting to apply our proposed techniques to a more complex domain, such as an appointment scheduling task (Minker et al., 1999), implying a more natural speech-based interaction. This would enable us to validate our approach on larger vocabulary sizes.INTERSPEECH’2005 SCIENCE QUIZ
Fun and imagination at INTERSPEECH’2005 – EUROSPEECH!
Upon registration in Lisbon, all INTERSPEECH’ 2005 participants received a sheet with 16 intruiging questions from the area of language and speech science and technology. They were selected from proposals from colleagues from all over the world. Participants were challenged to find the right answers during INTERSPEECH’ 2005 and to compete for the honour and a nice prize, a beautiful vase of Portuguese ceramics.
THE ANSWERS AND THE WINNER
Although lots of discussion was witnessed, and frantic Internet searches, the quiz was considered quite difficult, and only 40 participants returned the form. There was a tie between two participants who got 12 answers right. The final winner was Ibon Saratxaga from Spain, with Arlo Faria in second place. Other high scores were obtained by Mats Blomberg, Lou Boves, Frederic Bimbot, Athanassios Katsamanis and Bernd Möbius.