Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Natural Language Systems
                      Digital Audio 1.
Collaborative Customer Relationship Management (CCRM) User Group June 23 rd, 2004.
1 August 9, David Claiborn SLM Tuning: Lessons Learned.
Input/Output Devices Chapter 5b. Input Allow input into computer Data Commands Responses Programs Most popular input devices are keyboard and mouse.
ICT and medicine IT & C Department AP - Secretariat.
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.
Data Capture Methods. In this topic, we will be looking at: Methods of data capture When it would be appropriate to use each method Advantages and disadvantages.
Voice Biometric Overview for SfTelephony Meetup March 10, 2011 Dan Miller Opus Research.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Sensor-based Situated, Individualized, and Personalized Interaction in Smart Environments Simone Hämmerle, Matthias Wimmer, Bernd Radig, Michael Beetz.
Basic Application Software
Why is ASR Hard? Natural speech is continuous
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
Assistive Technology Russell Grayson EDUC 504 Summer 2006.
Chapter 12 Designing the Inputs and User Interface.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Speaker Recognition By Afshan Hina.
Conversational Applications Workshop Introduction Jim Larson.
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Douglas A. Reynolds, PhD Senior Member of Technical Staff
A study on Prediction on Listener Emotion in Speech for Medical Doctor Interface M.Kurematsu Faculty of Software and Information Science Iwate Prefectural.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
CP SC 881 Spoken Language Systems. 2 of 23 Auditory User Interfaces Welcome to SLS Syllabus Introduction.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
ICT and medicine. Objectives The uses of ICT in medicine The uses of ICT in medicine in patient records, medical equipments, internet devices…etcin patient.
Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.
Basic Application Software Chapter 3 Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. 3-1.
Class 13 LBSC 690 Information Technology More Multimedia Compression and Recognition, and Social Issues.
Chapter 3 Culture and Language. Chapter Outline  Humanity and Language  Five Properties of Language  How Language Works  Language and Culture  Social.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Creating User Interfaces Directed Speech. XML. VoiceXML Classwork/Homework: Sign up to be Voxeo developer. Do tutorials.
© 2013 by Larson Technical Services
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Basic Application Software.
Basic Application Software Chapter 3 McGraw-HillCopyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
© 2013 by Larson Technical Services
Using Voice to Solve Ergonomic Problems Dr. William Lenharth, CHFP UNH – Project54.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
Chris Hewitt, Wild Mouse Male, Age 42, Happy ARC31 2.
McGraw-Hill/Irwin Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Basic Application Software.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Natural Language Processing (NLP)
James A. Larson Developing & Delivering Multimodal Applications 1 EMMA Extensible MultiModal Annotation markup language Canonical structure for semantic.
Audio/Speech CS376: November 4, 2004 as presented by Jessica Kuo.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
Social Bar Every person is interested in three things at his bar: Every person is interested in three things at his bar: Music Music Alcohol Alcohol Social.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
IIS for Speech Processing Michael J. Watts
How can speech technology be used to help people with disabilities?
Chapter 15 Recording and Editing Sound
Input and Output Devices
Natural Language Processing and Speech Enabled Applications
Chapter 03: Basic Application Software
in English language teaching
Intro to Machine Learning
Voice Vocabulary Use it or lose it…..
Biometrics Reg: AMP/HNDIT/F/F/E/2013/067.
Dialog Design 4 Speech & Natural Language
Speech Processing August 4, /2/2018.
Chapter 8 Communicative competence
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Presentation transcript:

Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing © 2013 by Larson Technical Services1

Statistical Language Model-based Recognition Technologies Call Routing Speaker Identification Dictation Speaker emotion Voice pitch Age Gender Intoxication Stress Medical conditions (e.g., sleep apnea) © 2013 by Larson Technical Services2 Also used for Optical Character Recognition (OCR) Machine vision Big data analysis

Example Verbal Phrases with Annotations “I have a problem with my bill”accounting “Where is my order?”shipping “My gadget arrived broken”customer service “I need to return my gadget”shipping “My statement is wrong”accounting “I want a refund”accounting © 2013 by Larson Technical Services3 Annotate thousands of verbal phrases

Statistical Language Model (SLM) © 2013 by Larson Technical Services4 Statistical Language Model-based Speech Recognition Audio Input Feature Extraction Phoneme Identification Classifier Language Model Category Statistical Routines Verbal Phrases Annotated with categories Does not use grammars

Grammars vs. Statistical Language Models Hand-crafted rules Very high-accuracy Easy to assemble Finite phrases Used for Interactive Voice Response (IVR) Command and control Context-Free Grammars (CFGs) Data-driven High-accuracy Complex to assemble Natural language Used for dictation Statistical Language Models (SLMs)

Call Routing © 2013 by Larson Technical Services6 Where is my order? Classifier Accounting Customer Support Sales … How may I help you?

© 2013 by Larson Technical Services7 Speaker Identification Technologies General techniques for identifying people – Something you know – Something you have – Something about you Three basic functions for speaker identification – Speaker registration – Speaker authentication – Speaker identification Your speech features

© 2013 by Larson Technical Services8 Speaker Registration Speech Profiles Good Morning Joe’s Speech Features Good Morning Good Morning Wanda’s Speech Features Fred’s Speech Features

© 2013 by Larson Technical Services9 Speaker Authentication Speech Profiles Good morning Wanda’s speech features Good morning Wanda’s speech features Compare Used to supplement or replace passwords

© 2013 by Larson Technical Services10 Speaker Identification Speech Profiles Good morning Good morning Wanda’s speech features Joe’s speech features Good morning Good morning Wanda’s speech features Fred’s speech features Select

© 2013 by Larson Technical Services11 Speaker Identification Technologies Advantages – Are unobtrusive – Are location independent – Require no special equipment – Replace passwords Disadvantages – Sometimes fail Siblings with similar voice profiles Teenage male voice “break” Colds, sore throats, sore lips, etc Variety of microphones Tape recordings

Statistical Language Model-based Recognition Technologies Call Routing Speaker Authentication Dictation Speaker emotion Voice pitch Age Gender Intoxication Stress Medical conditions (e.g., sleep apnea) © 2013 by Larson Technical Services12 Widely available Actively being researched