1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

Slides:



Advertisements
Similar presentations
How to be a good teacher? What makes a good teacher?
Advertisements

Take a piece of pizza from the counter.
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
© 2002 – 2007 Versay Solutions, LLC. All rights reserved. Building Fault Tolerant Voice User Interfaces SpeechTEK 2007 Tuesday, August 21 Track B “Getting.
California English Language Development Test Review of the Test Composition.
SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett.
Section 3 Systems of Professional Learning Module 1 Grades 6–12: Focus on Practice Standards.
Speech Graffiti Tutorial MovieLine version Fall 03.
The Effect of Miscommunication Rate on User Response Preferences Hua Ai, University of Pittsburgh Thomas Harris, Carolyn Penstein Rosé, Carnegie Mellon.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Lti Intelligent Help (or lack thereof) in Spoken Dialog Systems Dialogs on Dialogs discussion Stefanie Tomko 20-Feb-04 HELP!
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science
Speech Graffiti Tutorial MovieLine version Fall 03.
Gender Issues in Systems Design and User Satisfaction for e- testing software Prepared by Sahel AL-Habashneh. Department of Business information systems.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Why is ASR Hard? Natural speech is continuous
Lecture 1 Introduction: Linguistic Theory and Theories
Stages of testing + Common test techniques
Speech Graffiti Tutorial FlightLine version Fall 03.
Stages of Second Language Acquisition
THE AUDIO-LINGUAL METHOD
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
Language Learning Strategies Recognizing your strengths and weaknesses, and practicing to improve what you can Adapted from Lessons From Good Language.
The Comparison of Pre-test and Post-test in Participation Participation Items p
Item52321 Content Full realization of the task. All content points included Good realization of the task. There is adherence to the task with one missing.
Independent Samples t-Test (or 2-Sample t-Test)
Using Just-in-Time Teaching for Large Course Instruction Kevin J. Apple James O. Benedict James Madison University.
Getting the Language Right ITSW 1410 Presentation Media Software Instructor: Glenda H. Easter.
Developing Communicative Dr. Michael Rost Language Teaching.
User Study Evaluation Human-Computer Interaction.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Task Based Learning In your classroom.
Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Inha University 2011 English Education Program. Welcome to Effective Communication in the Classroom EJ 417 Mondays from 10:00-11:50 Wednesdays from 11:00-11:50.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
Discourse Analysis ENGL4339
This document gives one example of how one might be able to “fix” a meteorological file, if one finds that there may be problems with the file. There are.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
What are the stages of test construction??? Take a minute and try to think of these stages???
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
The Direct Method 1. Background It became popular since the Grammar Translation Method was not very effective in preparing students to use the target.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Spoken Dialog Systems Diane J. Litman Professor, Computer Science Department.
Reinforcing Effort and Providing Recognition
Can a blind person guess the state of mind of someone they are talking with without seeing them? SAK-WERNICKA, JOLANTA. "EXPLORING THEORY OF MIND USE IN.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Speech Recognition Created By : Kanjariya Hardik G.
CELDT PRACTICE Speaking Version B.
1 Instructing the English Language Learner (ELL) in the Regular Classroom.
CELDT PRACTICE Speaking Version A.
Izyan Safwani Binti Ismail (P76364). In the learning process, one might find that some people can learn English language very quickly and some people.
Chapter 4: The Audio-Lingual Method
Supporter re-activation strategies Duane Raymond, FairSay
Automatic Speech Recognition
The Audio-Lingual Method
The Audio-Lingual Method
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Theories of Language Development
The what, where, when, and how of visual word recognition
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 11/8/2018.
LANGUAGE TEACHING MODELS
SPEAKING ASSESSMENT Joko Nurkamto UNS Solo 12/3/2018.
Presentation transcript:

1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006

2 Big picture (i.e. thesis statement)  A system of shaping and adaptivity can be used to induce more efficient user interactions with spoken dialog systems.  This strategy can increase efficiency by increasing the amount of user input that is actually understood by the system, leading to increased task completion rates and higher user satisfaction.  This strategy can also reduce upfront training time, thus accelerating the process of reaching optimally efficient interaction.

3 This study Speech Graffiti? (target) shapeable? (expanded ) {confsig} User input resul t shaping prompt yes no yes no

4 My approach, graphically Speech Graffiti? (target) shapeable? (expanded ) intelligent shaping help User input resul t shaping prompt yes no yes no

5 Speech Graffiti  Standardized framework of syntax, keywords, and principles  Domain-specific vocabulary Theater is Showcase North Theater Showcase Cinemas Pittsburgh North Genre is drama Drama What movies are playing? {confsig} [an error beep, since previous utterance is not in grammar] WHERE WAS I? Theater is Showcase Cinemas Pittsburgh North, genre is drama OPTIONS You can specify or ask about title, show time, rating, {ellsig} [a 3-beep list continuation signal] What is title? 2 matches: Dark Water, War of the Worlds START OVER Starting over Theater is Northway Mall Cinemas Eight Northway Mall Cinemas 8 What is address? 1 match: 8000 McKnight Road in Pittsburgh

6 Expanded grammar  Exploit the fact that knowledge of speaking to a limited-language system restricts input  Create a grammar that will accept more natural language input cf. SG  This grammar is opaque for users  Why have two grammars? Lower perplexity LMs  lower error rates Some applications may be SG-only  Restriction: linear mapping from EXP input to TGT equivalent

7 Shaping strategy  Handle user input accepted by expanded grammar but not target  Balance current task success with future interaction efficiency  Baseline strategy – this study: Confirm expanded grammar input with full, explicit slot+value confirmation Give result if appropriate for query

8 Study participants  “Normal” adults, i.e. not CMU students  15 males, 14 females, aged  Native speakers of American Eng.  Little/no computer programming exp  New to Speech Graffiti

9 Study design  Between-subjects  3 conditions non-shaping+tutorial (BT) shaping+tutorial (ST) shaping+no_tutorial (SN)  Tutorial 9-slide.ppt presentation 5 minutes

10 Study tasks  15 tasks  4 difficulty levels # of slots to be specified/queried  40 minutes or when all tasks completed Only one user did not get to attempt all 15 tasks in 40 minutes  Afterwards: SASSI questionnaire

11 Results  In short, the baseline shaping strategy didn’t have an effect   Efficiency  Mean results from shaping subjects are only slightly better – non-significant

12 User satisfaction  Again, no significant differences  No differences on individual SASSI factors  No efficiency/satisfaction differences between tutorial/non-tutorial, either

13 Grammaticality  How often did users speak within the Target SG grammar?  From Q1 to Q4, both groups showed significant increases in TGT gram

14 Error rates - WER  For non-shaping: 39.9% 30.3% for grammatical utts 38.3% utt-level concept error  For shaping: a bit harder to figure, because of 2-pass ASR Each shaping input generated a TGT hyp & a EXP hyp Selection based on AM/LM score and a few simple heuristics

15 Error rates – WER  Shaping: For selected hypothesis: 37.3% All TGT: 40.9% All EXP: 64.2%  25.6% utt-level concept error

16 So – what happened?  Shaping users had success with NL-ish input, and shaping prompts were not strong enough to change behavior.

17 Biggest problem  Using NL or slot-only query formats My theory: is specification format is very structured. what is sounds structured to me, but to users it sounds like  In new versions, query format will be list Users don’t seem to have too much trouble adapting to a structure – but the structure needs to be clear. Will also shape more explicitly by confirming with “I think you meant, ‘list movies’”  Also for more explicit shaping of specifications

18 Other problems  Not using start over to clear context  Confusion about semantics of location  Long utterances  Using next instead of more  Pacing  These will be addressed via targeted help messages

19 Current hang-up  Can we improve WER? LM improvements? COTS recognizer?  Dragon: Using Results Issues

20 A little bit about trying DNS  Dragon Naturally Speaking 8 Distribution from Jahanzeb  Set up for dictation – i.e. mic input So, no telephone models  To compare with Sphinx Test set of utterances from this study Rerecorded with head mic (so, read) at 16kHz Downsampled to 8kHz for Sphinx

21 More Dragon stuff  Two groups TGT  Sphinx mean 56.4% ( Worse than 8k telephone model (?)  Dragon mean 35.9%  Mean diff: Dragon 18.8pts less (ns) EXP  Sphinx mean 68.5%  Dragon mean 45.4%  Mean diff: Dragon 22.3pts less (s)

22 More Dragon stuff  But – Dragon rates are not that different from original Sphinx WER rates Sphinx WER in this test might be fishy  Setup seems tricky – can I still do 2-pass decoding?  Would need to change to mic setup  Black-box LM stuff Mysterious adaptation? – not good for user studies!  So, sticking with Sphinx.