Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Organic Interfaces Victor Zue MIT Computer Science and Artificial Intelligence Laboratory.

Similar presentations


Presentation on theme: "On Organic Interfaces Victor Zue MIT Computer Science and Artificial Intelligence Laboratory."— Presentation transcript:

1

2 On Organic Interfaces Victor Zue (zue@csail.mit.edu) MIT Computer Science and Artificial Intelligence Laboratory

3 Acknowledgements Eric Brill Scott Cyphers Jim Glass Dave Goddeau T J Hazen Lee Hetherington Lynette Hirschman Raymond Lau Hong Leung Helen Meng Mike Phillips Joe Polifroni Shinsuke Sakai Stephanie Seneff Dave Shipman Michelle Spina Nikko Ström Chao Wang Research Staff Graduate Students Anderson, M. Aull, A. Brown, R. Chan, W. Chang, J. Chang, S. Chen, C. Cyphers, S. Daly, N. Doiron, R. Flammia, G. Glass, J. Goddeau, D. Hazen, T.J. Hetherington, L. Huttenlocher, D. Jaffe, O. Kassel, R. Kasten,P. Kuo, J. Kuo, S. Lauritzen, N. Lamel, L. Lau, R. Leung, H. Lim, A. Manos, A. Marcus, J. Neben, N. Niyogi, P. Mou, X. Ng, K. Pan, K. Pitrelli, J. Randolph, M. Rtischev, D. Sainath, T. Sarma, S. Seward, D. Soclof, M. Spina, M. Tang, M. Wichiencharoen, A. Zeiger, K.

4 MIT Computer Science and Artificial Intelligence Laboratory Introduction

5 MIT Computer Science and Artificial Intelligence Laboratory Speech interfaces are ideal for information access and management when: The information space is broad and complex, The users are technically naive, The information device is small, or Only telephones are available. Speech interfaces are ideal for information access and management when: The information space is broad and complex, The users are technically naive, The information device is small, or Only telephones are available. Virtues of Spoken Language Natural:Requires no special training Flexible:Leaves hands and eyes free Efficient:Has high data rate Economical:Can be transmitted/received inexpensively

6 MIT Computer Science and Artificial Intelligence Laboratory Speech Text Recognition Speech Text Synthesis UnderstandingGeneration Communication via Spoken Language Meaning Human Computer InputOutput

7 MIT Computer Science and Artificial Intelligence Laboratory Components of a Spoken Dialogue System DISCOURSE CONTEXT DISCOURSE CONTEXT DIALOGUE MANAGEMENT DIALOGUE MANAGEMENT DATABASE Graphs & Tables LANGUAGE UNDERSTANDING LANGUAGE UNDERSTANDING Meaning Representation Meaning Representation Meaning LANGUAGE GENERATION LANGUAGE GENERATION SPEECH SYNTHESIS SPEECH SYNTHESIS Speech Sentence SPEECH RECOGNITION SPEECH RECOGNITION Speech Words

8 MIT Computer Science and Artificial Intelligence Laboratory Tremendous Progress to Date Technological Advances Inexpensive ComputingIncreased Task Complexity Data Intensive Training

9 MIT Computer Science and Artificial Intelligence Laboratory Some Example Systems BBN, 2007 MIT, 2007 KTH, 2007

10 MIT Computer Science and Artificial Intelligence Laboratory Speech Synthesis Recent trend moves toward corpus-based approaches –Increased storage and compute capacity –Availability of large text and speech corpora –Modeled after successful utilization for speech recognition Many successful implementations, e.g., –AT&T –Cepstral –Microsoft compassion disputed cedar city since giant since compassion disputed cedar city since giant since computer science

11 MIT Computer Science and Artificial Intelligence Laboratory But we are far from done … Machine performance typically lags far behind human performance How can interfaces be truly anthropomorphic? Lippmann, 1997

12 MIT Computer Science and Artificial Intelligence Laboratory Premise of the Talk Propose a different perspective on development of speech- based interfaces Draw from insights in evolution of computer science –Computer systems are increasingly complex –There is a move towards treating these complex systems like organisms that can observe, grow, and learn Will focus on spoken dialogue systems

13 MIT Computer Science and Artificial Intelligence Laboratory Organic Interfaces

14 MIT Computer Science and Artificial Intelligence Laboratory Computer: Yesterday and Today Computation of static functions in a static environment, with well- understood specification Computation is its main goal xxxxx Single agent xxxxxxxxxxxxxxxxxx Batch processing of text and homogeneous data Stand-alone applications Binary notion of correctness Adaptive systems operating in environments that are dynamic and uncertain Communication, sensing, and control just as important Multiple agents that may be cooperative, neutral, adversarial Stream processing of massive, heterogeneous data Interaction with humans is key Trade off multiple criteria Increasingly, we rely on probabilistic representation, machine learning techniques, and optimization principles to build complex systems

15 MIT Computer Science and Artificial Intelligence Laboratory Properties of Organic Systems Robust to changes in environment and operating conditions Learning through experiences Observe their own behavior Context aware Self healing …

16 MIT Computer Science and Artificial Intelligence Laboratory Research Challenges

17 MIT Computer Science and Artificial Intelligence Laboratory Some Research Challenges Robustness –Signal Representation –Acoustic Modeling –Lexical Modeling –Multimodal Interactions Establishing Context Adaptation Learning –Statistical Dialogue Management –Interactive Learning –Learning by Imitation Robustness –Signal Representation –Acoustic Modeling –Lexical Modeling –Multimodal Interactions Establishing Context Adaptation Learning –Statistical Dialogue Management –Interactive Learning –Learning by Imitation * Please refer to written paper for topics not covered in talk

18 MIT Computer Science and Artificial Intelligence Laboratory Robustness: Acoustic Modeling Statistical n-grams have masked the inadequacies in acoustic modeling, but at a cost –Size of training corpus –Application-dependent performance To promote acoustic modeling research, we may want to develop a sub-word based recognition kernel –Application independent –Stronger constraints than phonemes –Closed vocabulary for a given language Some success has been demonstrated (e.g., Chung & Seneff, 1998) sentence phonetics syntax semantics word (syllable) morphology phonotactics phonemics acoustics Acoustic Models LM Units Sub-word Units Speech Recognition Kernel

19 MIT Computer Science and Artificial Intelligence Laboratory Robustness: Lexical Access Current approaches represent words as phoneme strings Phonological rules are sometimes used to derive alternate pronunciations temperature Lexical representation based on features offers much appeal (Stevens, 1995) –Fewer models, less training data, greater parsimony –Alternative lexical access models (e.g., Zue, 1983) Lexical access based on islands of reliability might be better able to deal with variability

20 MIT Computer Science and Artificial Intelligence Laboratory Robustness: Multimodal Interactions Other modalities can augment/complement speech LANGUAGE UNDERSTANDING LANGUAGE UNDERSTANDING meaning SPEECH RECOGNITION SPEECH RECOGNITION GESTURE RECOGNITION GESTURE RECOGNITION HANDWRITING RECOGNITION HANDWRITING RECOGNITION MOUTH & EYES TRACKING MOUTH & EYES TRACKING

21 MIT Computer Science and Artificial Intelligence Laboratory Challenges for Multimodal Interfaces Input needs to be understood in the proper context –What about that one Timing information is a useful way to relate inputs Speech:Move this one over here Pointing: (object) (location) time Handling uncertainties and errors (Cohen, 2003) Need to develop a unifying linguistic framework

22 MIT Computer Science and Artificial Intelligence Laboratory Audio Visual Symbiosis The audio and visual signals both contain information about: –Identity/location of the person –Linguistic message –Emotion, mood, stress, etc. Integration of these sources of information has been known to help humans Benoit, 2000

23 MIT Computer Science and Artificial Intelligence Laboratory Audio Visual Symbiosis The audio and visual signals both contain information about: –Identity/location of the person –Linguistic message –Emotion, mood, stress, etc. Integration of these sources of information has been known to helps humans Exploiting this symbiosis can lead to robustness, e.g., –Locating and identifying the speaker Hazen et al., 2003

24 MIT Computer Science and Artificial Intelligence Laboratory Audio Visual Symbiosis The audio and visual signals both contain information about: –Identity/location of the person –Linguistic message –Emotion, mood, stress, etc. Integration of these sources of information has been known to helps humans Exploiting this symbiosis can lead to robustness, e.g., –Locating and identifying the speaker –Speech recognition/understanding augmented with facial features Huang et al., 2004

25 MIT Computer Science and Artificial Intelligence Laboratory Audio Visual Symbiosis The audio and visual signals both contain information about: –Identity/location of the person –Linguistic message –Emotion, mood, stress, etc. Integration of these sources of information has been known to helps humans Exploiting this symbiosis can lead to robustness, e.g., –Locating and identifying the speaker –Speech recognition/understanding augmented with facial features –Speech and gesture integration Gruenstein et al., 2006 Cohen, 2005

26 MIT Computer Science and Artificial Intelligence Laboratory Audio Visual Symbiosis The audio and visual signals both contain information about: –Identity/location of the person –Linguistic message –Emotion, mood, stress, etc. Integration of these sources of information has been known to helps humans Exploiting this symbiosis can lead to robustness, e.g., –Locating and identifying the speaker –Speech recognition/understanding augmented with facial features –Speech and gesture integration –Audio/visual information delivery Ezzat, 2003

27 MIT Computer Science and Artificial Intelligence Laboratory Establishing Context Context setting is important for dialogue interaction –Environment –Linguistic constructs –Discourse Much work has been done, e.g., –Context-dependent acoustic and language models –Sound segmentation –Discourse modeling Some interesting new directions –Tapestry of applications –Acoustic scene analysis (Ellis, 2006) calendar photos weather address stocks phonebook music

28 MIT Computer Science and Artificial Intelligence Laboratory Acoustic Scene Analysis Acoustic signals contain a wealth of information (linguistic message, environment, speaker, emotion, …) We need to find ways to adequately describe the signals time signal type: speech transcript: although both of the, both sides of the Central Artery … topic: traffic report speaker: female... signal type: speech transcript: Forecast calls for at least partly sunny weather … topic: weather, sponsor acknowledgement, time speaker: male... signal type: speech transcript: This is Morning Edition, Im Bob Edwards … topic: NPR news speaker: male, Bob Edwards... signal type: music genre: instrumental artist: unknown... Some time in the future …

29 MIT Computer Science and Artificial Intelligence Laboratory Learning Perhaps the most important aspect of organic interfaces –Use of stochastic modeling techniques for speech recognition, language understanding, machine translation, and dialogue modeling Many different ways to learn –Passive learning –Interactive learning –Learning by imitation

30 MIT Computer Science and Artificial Intelligence Laboratory Hetherington, 1991 Interactive Learning: An Example New words are inevitable, and they cannot be ignored Acoustic and linguistic knowledge is needed to –Detect –Learn, and –Utilize new words Fundamental changes in problem formulation and search strategy may be necessary

31 MIT Computer Science and Artificial Intelligence Laboratory Interactive Learning: An Example New words are inevitable, and they cannot be ignored Acoustic and linguistic knowledge is needed to –Detect –Learn, and –Utilize new words Fundamental changes in problem formulation and search strategy may be necessary New words can be detected and incorporated through –Dynamic update of vocabulary Chung & Seneff, 2004

32 MIT Computer Science and Artificial Intelligence Laboratory Interactive Learning: An Example New words are inevitable, and they cannot be ignored Acoustic and linguistic knowledge is needed to –Detect –Learn, and –Utilize new words Fundamental changes in problem formulation and search strategy may be necessary New words can be detected and incorporated through –Dynamic update of vocabulary –Speak and Spell Fillisko & Seneff, 2006

33 MIT Computer Science and Artificial Intelligence Laboratory Learning by Imitation Many tasks can be learned through interaction –This is how you enable Bluetooth. Enable Bluetooth. –These are my glasses. Where are my glasses? Promising research by James Allen (2007) –Learning phase: *User shows the system how to perform tasks (perhaps through some spoken commentary) *System learns the task through learning algorithms and updates its knowledge base –Application phase *Looks up tasks in its knowledge base and executes the procedure Allen et.al., (2007)

34 MIT Computer Science and Artificial Intelligence Laboratory In Summary Great strides have been made in speech technologies Truly anthropomorphic spoken dialogue interfaces can only be realized if they can behave like organisms –Observe, learn, grow, and heal Many challenges remain …

35 MIT Computer Science and Artificial Intelligence Laboratory Thank You

36 MIT Computer Science and Artificial Intelligence Laboratory Whats the phone number of Flora in Arlington ???? Whats the phone number of Flora in Arlington Dynamic Vocabulary Understanding Dynamically alter vocabulary within a single utterance Whats the phone number for Flora in Arlington. Arlington Diner Blue Plate Express Tea Tray in the Sky Asiana Grille Bagels etc Flora …. Hub NLG ASR Context TTS Dialog NLU Audio DB The telephone number for Flora is … Clause:wh_question Property: phone Topic: restaurant Name: ???? City: Arlington Clause:wh_question Property: phone Topic: restaurant Name: Flora City: Arlington


Download ppt "On Organic Interfaces Victor Zue MIT Computer Science and Artificial Intelligence Laboratory."

Similar presentations


Ads by Google