BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.

BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph Picone, PhD Professor, Electrical Engineering Mississippi State University Patti Price, PhD VP Business Development BravoBrava LLC

BravoBrava Mississippi State University Outline Introduction and state of the art (Price) Research issues (Picone) –Evaluation metrics –Acoustic modeling –Language modeling –Practical issues –Technology demands Conclusion and future directions (Price)

BravoBrava Mississippi State University Introduction What is Speech Recognition? Speech Recognition Words “How are you?” Speech Signal Goal: Automatically extract the string of words spoken from the speech signal Speech recognition does NOT determine –Who is talker (speaker recognition, Heck and Reynolds) –Speech output (speech synthesis, Fruchterman and Ostendorf) –What the words mean (speech understanding)

BravoBrava Mississippi State University Introduction Speech in the Information Age Speech & text were revolutionary because of information access New media and connectivity yield information overload Can speech technology help? Time Source of Information SpeechText Film, video, multimedia, voice mail, radio, television, conferences, web, on-line resources Access to Information Listen, remember Read books Computer typing Careful spoken, written input Conversational language

BravoBrava Mississippi State University State of the Art Initial and Current Applications Database query –Resource management –Air travel information –Stock quote 1997 Command and control –Manufacturing –Consumer products http://www.speech.be.philips.com/ Dictation –http://www.dragonsys.comhttp://www.dragonsys.com –http://www-4.ibm.com/software/speechhttp://www-4.ibm.com/software/speech Nuance, American Airlines: 1-800-433-7300, touch 1

BravoBrava Mississippi State University State of the Art How Do You Measure? USC, October 15, 1999: “the world's first machine system that can recognize spoken words better than humans can.” “ In benchmark testing using just a few spoken words, USC's Berger-Liaw … System not only bested all existing computer speech recognition systems but outperformed the keenest human ears.” What benchmarks? What was training? What was test? Were they independent? How large was the vocabulary and the sample size? Did they really test all existing systems? “… functions at 60 percent recognition with a hubbub level 560 times the strength of the target stimulus.” Is that different from chance? Was the noise added or coincident with speech? What kind of noise? Was it independent of the speech?

BravoBrava Mississippi State University all speakers of the language including foreign application independent or adaptive all styles including human-human (unaware) wherever speech occurs 2005 State of the Art Factors that Affect Performance vehicle noise radio cell phones regional accents native speakers competent foreign speakers some application– specific data and one engineer year natural human- machine dialog (user can adapt) 2000 expert years to create app– specific language model speaker independent and adaptive normal office various microphones telephone planned speech 1995 NOISE ENVIRONMENT SPEECH STYLE USER POPULATION COMPLEXITY 1985 quiet room fixed high – quality mic careful reading speaker- dep. application – specific speech and language

BravoBrava Mississippi State University Research Theory and Trends Initial and Current Applications Insert Joe’s slides here

BravoBrava Mississippi State University Conclusion and Future Directions Trends We need new technology to help with information overload Speech information sources are everywhere –Voice mail messages –Professional talk –Lectures, broadcasts Speech sources of information will increase –As devices shrink –As mobility increases –New uses: annotation, documentation Speech as AccessSpeech as SourceInformation as Partner What are the words?What does it mean?Here’s what you need.

BravoBrava Mississippi State University Conclusion and Future Directions Limitations on Applications Recognition performance, especially in error recovery UI Natural language understanding (speech differs from text) –Speech unfolds linearly in time –Speech is more indeterminate than text –Speech has different syntax and semantics –Prosody differs from punctuation Cost to develop applications (too few experts) Cost to integrate/interoperate with other technologies New capabilities –"When did he say Y and was he angry?” –Scanning, refocusing quickly (browsing) –Match past pattern, find novel aspects –Proactive information –Gist, summarize, translate for different purposes

BravoBrava Mississippi State University Conclusion and Future Directions Applications on the Horizon Beginnings of speech as source of information ISLIP http://www.mediasite.net/info/frames.htmhttp://www.mediasite.net/info/frames.htm Virage http://www.virage.comhttp://www.virage.com Why doesn’t belong in the classroom Beulah Arnott: also true of indoor plumbing BravoBrava: Co-evolving technology and people can – Dramatically reduce the cost of delivery of content – Increase its timeliness, quality and appropriateness – Target needs of individual and/or group – Reading Pal demo Speech technology in education and training Cliff Stoll, High Tech Heretic –Good schools need no computers –Bad schools won’t be improved by them

BravoBrava Mississippi State University Summary Goal: Speech Better Than Text Healthy loop between research and applications Research leads to applications, which lead to new research opportunities We need collaboration Too much for one person, one site, one country Humans will probably continue to be better than machines at many things Can we learn to use technology and training to augment human-human and human-machine collaboration? It’s not a solved problem Further technology development needed to enable the vision

BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.

Similar presentations

Presentation on theme: "BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph.

Similar presentations

Presentation on theme: "BravoBrava Mississippi State University Can Advances in Speech Recognition make Spoken Language as Convenient and as Accessible as Online Text? Joseph."— Presentation transcript:

Similar presentations

About project

Feedback