BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.

Slides:



Advertisements
Similar presentations
Introduction to e-Business. History of WWW Late 1960s, ARPA (Advanced Research Project Agency) of Dept of Defense sponsored some of MIT graduate student.
Advertisements

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Richard Yu.  Present view of the world that is: Enhanced by computers Mix real and virtual sensory input  Most common AR is visual Mixed reality virtual.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Component-Based Software Engineering Oxygen Paul Krause.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Meeting Recorder Adam Janin
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science
Educom’98: Making the Connections An EDUCAUSE Conference on Information Technology in Higher Education.
Chapter 12: Intelligent Systems in Business
Auditory User Interfaces
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
ICT at Work Global Communication.
Systems Analysis and Design in a Changing World, 6th Edition
Unit 3 Effective Communication BMA-IBT-6 Use professional oral, written, and digital communication skills to create, express, and interpret information.
2 Read to Learn How the workplace is affected by forces such as changing technology and the global economy How to evaluate job outlooks when making career.
Microsoft Office 2003 Illustrated Introductory a Presentation Creating.
Mobile and Pervasive Computing - 8 Natural Language Processing Presented by: Dr. Adeel Akram University of Engineering and Technology, Taxila,Pakistan.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Using ICT to Support Students who are Deaf. 2 Professional Development and Support: Why? Isolation Unique and common problems Affirmation Pace of change.
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
COMMUNICATION SYSTEM (2) CT1401 LECTURE-9 : MOBILE PHONE BY : AFNAN ALAYYASH SUPERVISION : DR.OUIEM BCHIR.
Interfacing with the Machine Jay Desloge SENS Corporation Sumit Basu Microsoft Research.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.
Using Common Sense Reasoning to Create Intelligent Mobile Applications Software Agents Group MIT Media Lab.
PHILIPS SPEECH PROCESSING Voic Association Vienna, Reimund Schmald Regional Sales Director GSM
Chapter 4 – Slide 1 Effective Communication for Colleges, 10 th ed., by Brantley & Miller, 2005© Technology and Electronic Communication.
1 Computational Linguistics Ling 200 Spring 2006.
Administrative Software Chapter 7 Teaching and Learning with Technology.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Author(s): MELO 3D Project Team, 2011 License: This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a.
A seminar on “Mobile Version of The Website”
PMS Software Ltd Electronic Communications A Guide.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES INTRODUCTION 6/1/ A.Aruna, Assistant Professor, Faculty of Information Technology.
+ New Media Production CA ~ Siri By Eva Lucey. + Introduction to Siri Apple’s latest iPhone feature – New Application First seen in October 2011 – iPhone.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
INSTITUTE FOR SIGNAL AND INFORMATION PROCESSING Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi State.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
Higher Vision, language and movement. Strong AI Is the belief that AI will eventually lead to the development of an autonomous intelligent machine. Some.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
NATURAL LANGUAGE PROCESSING Zachary McNellis. Overview  Background  Areas of NLP  How it works?  Future of NLP  References.
By: Chen Shen Yijun Niu Ke Wang Yan Lu Azad Patwary Poonam Bhatt Marc Perez Stella Malla Steven Meikle Johny Tran.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
What is Input?  Input  Processing  Output  Storage Everything we enter into the computer to do is Input.
Technology in the Classroom Why should we use it?.
Technology for deaf people. City Lit This session is relevant to: Assignment 4 Technology for deaf people 4a Emerging technology Analyse the current developments.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Auto Speech Recognition by İlkay ATIL Outline-1 Introduction Today and Future of ASR Automatic Speech Recognition Types of ASR systems Fundamentals.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
SPEECH TECHNOLOGY An Overview Gopala Krishna. A
2/21/ :54 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Chapter 1- Introduction
Academic year: 2017/18 – winter semester
Automatic Speech Recognition
Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph Picone, PhD Professor, Electrical.
CHAPTER 7 Distance Education By SRIKANTH BANDARU
HUMAN AND SYSTEMS ENGINEERING:
A maximum likelihood estimation and training on the fly approach
Presentation transcript:

BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph Picone, PhD Professor, Electrical Engineering Mississippi State University Patti Price, PhD VP Business Development BravoBrava LLC

BravoBrava Mississippi State University Outline Introduction and state of the art (Price) Research issues (Picone) –Evaluation metrics –Acoustic modeling –Language modeling –Practical issues –Technology demands Conclusion and future directions (Price)

BravoBrava Mississippi State University Introduction What is Speech Recognition? Speech Recognition Words “How are you?” Speech Signal Goal: Automatically extract the string of words spoken from the speech signal Speech recognition does NOT determine –Who is talker (speaker recognition, Heck and Reynolds) –Speech output (speech synthesis, Fruchterman and Ostendorf) –What the words mean (speech understanding)

BravoBrava Mississippi State University Introduction Speech in the Information Age Speech & text were revolutionary because of information access New media and connectivity yield information overload Can speech technology help? Time Source of Information SpeechText Film, video, multimedia, voice mail, radio, television, conferences, web, on-line resources Access to Information Listen, remember Read books Computer typing Careful spoken, written input Conversational language

BravoBrava Mississippi State University State of the Art Initial and Current Applications Database query –Resource management –Air travel information –Stock quote 1997 Command and control –Manufacturing –Consumer products Dictation – – Nuance, American Airlines: , touch 1

BravoBrava Mississippi State University State of the Art How Do You Measure? USC, October 15, 1999: “the world's first machine system that can recognize spoken words better than humans can.” “ In benchmark testing using just a few spoken words, USC's Berger-Liaw … System not only bested all existing computer speech recognition systems but outperformed the keenest human ears.” What benchmarks? What was training? What was test? Were they independent? How large was the vocabulary and the sample size? Did they really test all existing systems? “… functions at 60 percent recognition with a hubbub level 560 times the strength of the target stimulus.” Is that different from chance? Was the noise added or coincident with speech? What kind of noise? Was it independent of the speech?

BravoBrava Mississippi State University all speakers of the language including foreign application independent or adaptive all styles including human-human (unaware) wherever speech occurs 2005 State of the Art Factors that Affect Performance vehicle noise radio cell phones regional accents native speakers competent foreign speakers some application– specific data and one engineer year natural human- machine dialog (user can adapt) 2000 expert years to create app– specific language model speaker independent and adaptive normal office various microphones telephone planned speech 1995 NOISE ENVIRONMENT SPEECH STYLE USER POPULATION COMPLEXITY 1985 quiet room fixed high – quality mic careful reading speaker- dep. application – specific speech and language

BravoBrava Mississippi State University Research Theory and Trends Initial and Current Applications Insert Joe’s slides here

BravoBrava Mississippi State University Conclusion and Future Directions Trends We need new technology to help with information overload Speech information sources are everywhere –Voice mail messages –Professional talk –Lectures, broadcasts Speech sources of information will increase –As devices shrink –As mobility increases –New uses: annotation, documentation Speech as AccessSpeech as SourceInformation as Partner What are the words?What does it mean?Here’s what you need.

BravoBrava Mississippi State University Conclusion and Future Directions Limitations on Applications Recognition performance, especially in error recovery UI Natural language understanding (speech differs from text) –Speech unfolds linearly in time –Speech is more indeterminate than text –Speech has different syntax and semantics –Prosody differs from punctuation Cost to develop applications (too few experts) Cost to integrate/interoperate with other technologies New capabilities –"When did he say Y and was he angry?” –Scanning, refocusing quickly (browsing) –Match past pattern, find novel aspects –Proactive information –Gist, summarize, translate for different purposes

BravoBrava Mississippi State University Conclusion and Future Directions Applications on the Horizon Beginnings of speech as source of information ISLIP Virage Why doesn’t belong in the classroom Beulah Arnott: also true of indoor plumbing BravoBrava: Co-evolving technology and people can – Dramatically reduce the cost of delivery of content – Increase its timeliness, quality and appropriateness – Target needs of individual and/or group – Reading Pal demo Speech technology in education and training Cliff Stoll, High Tech Heretic –Good schools need no computers –Bad schools won’t be improved by them

BravoBrava Mississippi State University Summary Goal: Speech Better Than Text Healthy loop between research and applications Research leads to applications, which lead to new research opportunities We need collaboration Too much for one person, one site, one country Humans will probably continue to be better than machines at many things Can we learn to use technology and training to augment human-human and human-machine collaboration? It’s not a solved problem Further technology development needed to enable the vision