Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science

Similar presentations


Presentation on theme: "Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science"— Presentation transcript:

1 Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science http://www.cs.cmu.edu/~air

2 Outline Speech Types of speech interfaces Speech systems and their structure Designing speech interfaces Some applications –SpeechWear –Communicator

3 Speech as a signal The difference between speech and sound –“CD” quality vs. intelligible quality high-quality is 44.1 / 48 kHz desirable speech bandwidth: 0-8kHz, 16bits –at 16bits/sample: 256kbps (tethered mic) –telephone: 64kbps (and lower) –Compression: –MPEG: 64kbps/channel and up (but not speech-optimal) –CELP: 16kbps … 2.4kbps (optimized for speech)

4 Speech for communication The difference between speech and language Speech recognition and speech understanding

5 Computers and speech Transcription –dictation, information retrieval Command and control –data entry, device control, navigation Information access –airline schedules, stock quotes Problem solving –travel planning, logistics

6 Speech system architecture SIGNAL PROCESSING DECODING UNDERSTANDING DISCOURSE ACTION

7 Varieties of speech systems

8 A generic speech system speech Signal processing Dialog manager Decoder Parser Language Generator Speech synthesizer Post parser Domain agent Domain agent Domain agent speechdisplayeffector

9 Decoding speech Signal processing Decoder Reduce dimensionality of signal noise conditioning Transcribe speech to words Acoustic models Language models Corpus-base statistical models

10 Creating models for recognition Acoustic models Language models Speech data Text data Train Transcribe*

11 Understanding speech Parser Post parser Extract semantic content from utterance Introduce context and world knowledge into interpretation Grammar Context Domain Agents Grounding, knowledge engineering Ontology design, language acquisition

12 Interacting with the user Dialog manager Domain agent Domain agent Domain agent Guide interaction through task Map user inputs and system state into actions Interact with back-end(s) Interpret information using domain knowledge Task schemas Database Live data (e.g. Web) Domain expert Context Task analysis Knowledge engineering

13 Communicating with the user Language Generator Speech synthesizer Display Generator Action Generator Decide what to say to user (and how to phrase it)

14 Speech recognition and understanding Sphinx system –speaker-independent –continuous speech –large vocabulary ATIS system –air travel information retrieval –context management film clip

15 Command and control systems Small vocabularies, fixed syntax –OPEN WINDOW –MOVE OBJECT to –Applications: data entry (e.g., zip codes), process control (e.g., electron microscope, darkroom equipment) Large vocabulary, fixed syntax –Web browsing (?)

16 SpeechWear Vehicle inspection task –USMC mechanics, fixed inspection form –Wearable computer (COTS components) –html-based task representation film clip

17 Information access Moderate to very large vocabulary –IVR and frame based systems Commercial systems: –Nuance: http://www.nuance.com/demo/index.html http://www.nuance.com/demo/index.html –SpeechWorks: http://www.speechworks.com/demos/demos.htm http://www.speechworks.com/demos/demos.htm –lots of others..

18 IVR and frame-based systems Interactive voice response (IVR) –interactions specified by a graph (typically a tree) Frame systems –ergodic graphs –states defined by multi-item forms

19 Graph-based systems Welcome to Bank ABC! Please say one of the following: Balance, Hours, Loan,... What type of loan are you interested in? Please say one of the following: Mortgage, Car, Personal,.....

20 Frame-based systems I would like to fly to Boston –I’d like to go to Boston on Friday, … When would you like to fly? Destination_City: Boston Departure_Date: ______ Departure_Time: ______ Preferred_Airline: ______...

21 Frame-based systems Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Zxfgdh_dxab: _____ askjs: _____ dhe: _____ aa_hgjs_aa: _____. Transition on keyword or phrase

22 Some problems IVR systems work great, but only for well- structured (& “shallow”) tasks Frame systems are good for “tasks” that correspond to a single form leading to an action Neither approach does well with more complex problem-solving activities

23 Dialog Systems Problem solving activity; complex task –Order of progression through task depends on user goals (which can change) and system state (a back-end retrieval) and is not predictable. Track progress and help task along –mixed-initiative dialog Discourse phenomena –User expect to “converse” with the system

24 Carnegie Mellon Communicator A dialog system that supports complex problem solving in a travel planning domain –create an itinerary using air schedule, hotel and car information –186 U.S. airports (>140k enplanements/yr) currently: >500 world airports Web-based data resources –Live and cached flight information –Airport, airline, etc. information

25 Value schema/handlers value transform receptors Domain Agent

26 Compound schema value transform Value_3 Value_1 Value_2 Domain Agent e.g. SQL query +

27 Schema ordering Value i Value j Value k Schema i Schema j Schema k Destination airport Date Time Flight Leg Value transform Available flights Database lookup

28 Carnegie Mellon Communicator CMU Communicator –Call: 268-5144 –the information is accurate; you can use it for your own travel planning...

29 User-aware speech interfaces Predictable behavior on the system’s part Users coomunicate at different levels http://www.speech.cs.cmu.edu/air/papers/Interface Chars.htmlhttp://www.speech.cs.cmu.edu/air/papers/Interface Chars.html

30 User-aware speech interfaces Content: task-centric utterances Possibility: What can I do? Orientation: Where are we? Navigation: moving through the task space Control: verbose/terse, listen! Customization: define this word

31 Speech interface guidelines Speech recognition is errorful System state is often opaque to the user http://www.speech.cs.cmu.edu/air/papers/S pInGuidelines/SpInGuidelines.htmlhttp://www.speech.cs.cmu.edu/air/papers/S pInGuidelines/SpInGuidelines.html

32 Interface guidelines State transparency Input control Error recovery Error detection Error correction Log performance Application integration

33 Summary Speech and language communication Dialog structure Interface design


Download ppt "Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science"

Similar presentations


Ads by Google