Dialog Design 4 Speech & Natural Language

Slides:

Advertisements

Similar presentations

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.

Advertisements

Natural Language Systems

User Interfaces. Good interface design  A good interface design can help to ensure that users carry out their tasks: – Safely - in the case of a jumbo.

1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.

Class 6 LBSC 690 Information Technology Human Computer Interaction and Usability.

Dialogue Design Speech, pen, and gestures Speech Output  Tradeoffs in speed, naturalness and understandability  Male or female voice? Technical issues.

Interaction – Speech and Pen Natural input Universal design Take advantage of familiarity, existing knowledge Alternative input & output Multi-modal.

03/04/2005ENEE408G Spring 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 3: Digital.

Dialog Design Speech/Natural Language Pen & Gesture.

Dialog design How do we communicate with computers?

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

Dialog Design Command languages, direct manipulation, and WIMP.

Auditory User Interfaces

How do we communicate with computers?

09/09/2005ENEE408G Fall 2005 Multimedia Signal Processing 1 ENEE408G: Capstone Design Project: Multimedia Signal Processing Design Project 1: Digital Speech.

Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.

HUMANOID ANIMATION DRIVEN BY HUMAN VOICE Thesis Advisor : Dr. Donald P. Brutzman Second Reader : Dr. Xiaoping Yun A Thesis By Ozan APAYDIN, Turkish Navy.

Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.

Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.

Speech Communications (Chapter 7) Prepared by: Ahmed M. El-Sherbeeny, PhD 1.

Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services

Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.

Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.

Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

Input Devices.  Identify audio and video input devices  List the function of the respective devices.

11.10 Human Computer Interface www. ICT-Teacher.com.

Unit 1_9 Human Computer Interface. Why have an Interface? The user needs to issue instructions Problem diagnosis The Computer needs to tell the user what.

©RavichandranUser interface Slide 1 User interface design.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.

1 Computational Linguistics Ling 200 Spring 2006.

CMU Shpinx Speech Recognition Engine Reporter : Chun-Feng Liao NCCU Dept. of Computer Sceince Intelligent Media Lab.

Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.

MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.

1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.

Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.

KAMI KITT ASSISTIVE TECHNOLOGY Chapter 7 Human/ Assistive Technology Interface.

1 Human Computer Interaction Week 5 Interaction Devices and Input-Output.

Robert Crawford, MBA West Middle School.  Explain how input devices are suited to certain kinds of data.  Distinguish between RAM and ROM.  Identify.

© 2013 by Larson Technical Services

Intermediate 2 Computing Unit 2 - Software Development.

Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.

Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Basics of Natural Language Processing Introduction to Computational Linguistics.

Speech Recognition Created By : Kanjariya Hardik G.

PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.

XP Practical PC, 3e Chapter 14 1 Recording and Editing Sound.

Presented By Sharmin Sirajudeen S7 CS Reg No :

Dialog Design - Speech & Natural Language This material has been developed by Georgia Tech HCI faculty, and continues to evolve. Contributors include Gregory.

How can speech technology be used to help people with disabilities?

Chapter 15 Recording and Editing Sound

G. Anushiya Rachel Project Officer

Speech Recognition

Human Computer Interaction (HCI)

Natural Language Processing and Speech Enabled Applications

11.10 Human Computer Interface

Automatic Speech Recognition

Text-To-Speech System for English

FACE RECOGNITION TECHNOLOGY

Unit 2 User Interface Design.

TYPES AND COMPONENTS OF COMPUTER SYSTEM

Ch.1: Introduction to audio signal processing

SILENT SOUND TECHNOLOGY

GRAPHICAL USER INTERFACE

Proper functionality Good human computer interface Easy to maintain

Human and Computer Interaction (H.C.I.) &Communication Skills

Artificial Intelligence 2004 Speech & Natural Language Processing

Presentation transcript:

Dialog Design 4 Speech & Natural Language Agenda What is speech? When to use speech SHW discussion Speech output Speech input Designing the speech interaction Fall 2002 CS/PSY 6750

A Voice Interface Fall 2002 CS/PSY 6750

When to Use Speech Hands busy Mobility required Eyes occupied Conditions preclude use of keyboard Visual impairment Physical limitation Fall 2002 CS/PSY 6750

Topic Discussion Is speech appropriate to this task? Was it well done? Airline info system, telephone based Was it well done? Acoustics Technical implementation (recognition, etc.) Interface flow What could have been better? Fall 2002 CS/PSY 6750

Speech What is speech? English speech Sounds transmit “language” Vibrations of vocal cords creates sound “ahh” Mouth, throat, tongue, lips shape sound English speech 40 phonemes; 24 consonants, 16 vowels Sounds transmit “language” Fall 2002 CS/PSY 6750

Waveform & Spectrogram Speech does not equal written language Fall 2002 CS/PSY 6750

Parsing Sentences "I told him to go back where he came from, but he wouldn't listen." Fall 2002 CS/PSY 6750

Speech Input Speaker recognition Speech recognition Natural language understanding Fall 2002 CS/PSY 6750

Speaker Recognition Tell which person it is (voice print) Could also be important for monitoring meetings, determining speaker Fall 2002 CS/PSY 6750

Speech Recognition Primarily identifying words Improving all the time Commercial systems: IBM ViaVoice, Dragon Dictate, ... Fall 2002 CS/PSY 6750

Recognition Dimensions Speaker dependent/independent Parametric patterns are sensitive to speaker With training (dependent) can get better Vocabulary Some have 50,000+ words Isolated word vs. continuous speech Continuous: where words stop & begin Typically a pattern match, no context used Did you vs. Didja Fall 2002 CS/PSY 6750

Recognition Systems Typical system has 5 components: Speech capture device - Has analog -> digital converter Digital Signal Processor - Gets word boundaries, scales, filters, cuts out extra stuff Preprocessed signal storage - Processed speech buffered for recognition algorithm Reference speech patterns - Stored templates or generative speech models for comparisons Pattern matching algorithm - Goodness of fit from templates/model to user’s speech Fall 2002 CS/PSY 6750

Errors Systems make four types of errors: Substitution - one for another Rejection - detected, but not recognized Insertion - added Deletion - not detected Which is more common, dangerous? Fall 2002 CS/PSY 6750

Natural Language Understanding Putting meaning to the words Input might be speech or could be typed in Holy grail of Artificial Intelligence problems Fall 2002 CS/PSY 6750

NL Factors/Terms Syntactic Prosodic Pragmatic Semantic Grammar or structure Prosodic Inflection, stress, pitch, timing Pragmatic Situated context of utterance, location, time Semantic Meaning of words Fall 2002 CS/PSY 6750

SR/NLU Advantages Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate Fall 2002 CS/PSY 6750

SR/NLU Disadvantages Doesn’t work good enough yet Assumes knowledge of problem domain Not prompted, like menus Requires typing skill (if keyboard) Enhancements are invisible Expensive to implement Fall 2002 CS/PSY 6750

Recall A natural language interface need not be speech A speech interface need not use natural language (might be more command language-like) Wizard of Oz evaluations are particularly useful in this area Fall 2002 CS/PSY 6750

Speech Output Male or female voice? Rate of speech Technical issues (freq. response of phone) User preference (depends on the application) Rate of speech Technically up to 550 wpm! Depends on listener (blind: 150-300 wpm) Synthesized or Pre-recorded? Synthesized: Better coverage, flexibility Recorded: Better quality, acceptance Fall 2002 CS/PSY 6750

Speech Output Synthesis Recorded segments Numbers What was the airline system’s output like ?? Synthesis Quality depends on software ($$) Influence of vocabulary and phrase choices Recorded segments Store tones, then put them together The transitions are difficult (e.g., numbers) Numbers Record three versions (rise, flat, fall) Logic to determine which version to play Fall 2002 CS/PSY 6750

Designing the Interaction Constrain vocabulary Limit valid commands Structure questions wisely (Yes/No) Manage the interaction Examples from the airline systems? Slow speech rate, but concise phrases Design for failsafe error recovery Process preview & progress indicator Fall 2002 CS/PSY 6750

Speech Tools/Toolkits Java Speech SDK FreeTTS 1.1.1 http://freetts.sourceforge.net/docs/index.php "For 3/4 or 75% of his time, Dr. Walker practices for $90 a visit on Dr. Dr., next to King Philip X of St. Lameer St. in Nashua NH." IBM JavaBeans for speech Visual/Real Basic speech SDK OS capabilities (speech recognition and synthesis built in to OS) (TextEdit) VoiceXML Talking Clock Fall 2002 CS/PSY 6750