Turn-taking and Disfluencies

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Speech and Language Processing Chapter 10 of SLP Advanced Automatic Speech Recognition (II) Disfluencies and Metadata.
Using disfluency to understand, um, sentences... with PP-attachment ambiguities Jennifer E. Arnold and Kellen Carpenter, UNC Chapel Hill Background 1)
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
High Frequency Word Entrainment in Spoken Dialogue ACL, June Columbus, OH Department of Computer and Information Science University of Pennsylvania.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
1 Back Channel Communication Antoine Raux Dialogs on Dialogs 02/25/2005.
Writing on Formal and Informal Language
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
A Study in Cross-Cultural Interpretations of Back-Channeling Behavior Yaffa Al Bayyari Nigel Ward The University of Texas at El Paso Department of Computer.
1 Computational Linguistics Ling 200 Spring 2006.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
Discourse 2 – Multi-speaker interaction LO: to understand key features of conversational analysis and be able to analyse spoken texts Starter: imagine.
Pragmatics conversation. FloorFloor –The right to speak TurnTurn –The opportunity to speak Turn-takingTurn-taking –The change of speaker Local management.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Turn-taking and Backchannels Ryan Lish. Turn-taking We all learned it in preschool, right? Also an essential part of conversation Basic phenomenon of.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Listening Comprehension in Pedagogical Research
Analysis of spontaneous speech
Linguistic knowledge for Speech recognition
3.0 Map of Subject Areas.
Recognizing Disfluencies
Why Study Spoken Language?
Recognizing Structure: Dialogue Acts and Segmentation
Error Detection and Correction in SDS
Studying Intonation Julia Hirschberg CS /21/2018.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Intonational and Its Meanings
Intonational and Its Meanings
THE NATURE OF SPEAKING Joko Nurkamto UNS Solo.
The interactive alignment model
Discourse How to analyse speech.
Dialogue Acts Julia Hirschberg CS /18/2018.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Why Study Spoken Language?
Learner resource 7 Features of spoken discourse
Advanced NLP: Speech Research and Technologies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Turn-taking and Disfluencies
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
High Frequency Word Entrainment in Spoken Dialogue
Implications of interactive alignment
Recognizing Disfluencies
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
Discourse Structure in Generation
Spoken Dialogue Systems
Recognizing Structure: Dialogue Acts and Segmentation
Artificial Intelligence 2004 Speech & Natural Language Processing
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Guest Lecture: Advanced Topics in Spoken Language Processing
Automatic Prosodic Event Detection
Presentation transcript:

Turn-taking and Disfluencies Julia Hirschberg CS 4706 11/23/2018

Today Turn-taking behaviors Disfluencies Conversational Analysis Importance in real systems Disfluencies How to model? Detect? Role in human-human interaction Importance in real systems? 11/23/2018

Turn-taking Expected patterns of behavior Deviation is significant How do we find the patterns? Ordinary conversation Telephone talk Meetings Email? Who looks for these? 11/23/2018

Terminology Examples: Adjacency pairs Preference Pre-sequence Repair Telephone openings, closings Broadcasts 11/23/2018

Could this be useful when we build SDS? What do we expect to hear? What should we produce? 11/23/2018

Auditory Cues to Turn-Taking M. Schegloff “Reflections on studying prosody in talk-in-interaction, ” Language and Speech 41, 1999. (Michael Mu.) H. Koiso et al ‘99 “An analysis of turn-taking and backchannels based on prosodic and syntactic features…,” Language and Speech 41, 1999. (Sarah) 11/23/2018

Disfluencies and Self-Repairs Are these just ‘noise’? For people S. Brennan & M. Williams, “The Feeling of Another’s Knowing,” J Memory and Language 34, 1995. (Judd) S. Brennan & Schober, “How listeners compensate for disfluencies in spontaneous speech,” J Memory and Language 44, 2001. (Aron) For parsers For speech recognizers 11/23/2018

Hindle ’83: Finding the Edit Signal If we have it, can we ‘repair’ the self-repair automatically? Builds a correcting parser, Fidditch, for spontaneous speech Given a string with an edit signal marked, produces a ‘repaired’ version I was * I am really annoyed If X1 * X2 are similar linguistic elements separated by an edit signal, replace X1 w/X2 11/23/2018

What does it mean to be the same Same surface string Well if they’d * if they’d… Same category I was just that * the kind of guy… Same constituent I think that you get * it’s more strict in Catholic schools Restarts are completely different… I just think * Do you want something to eat? 11/23/2018

Bear et al ’92: Detecting and Correcting Self-Repairs Use multiple knowledge sources but not edit signal Lexical pattern matching Parsing failure + pattern matching + re-parsing Acoustic information: pause, peak F0, Cue words: well, no Fragments 11/23/2018

But…is there an edit signal? 11/23/2018

11/23/2018

RIM Model of Self-Repairs (Nakatani & Hirschberg ’94) ATIS corpus 6414 turns with 346 (5.4%) repairs, 122 speakers Hand-labeled for repairs and prosodic features Findings: Reparanda: 73% end in fragments, 30% in glottalization, co-articulatory gestures DI: pausal duration differs significantly from fluent boundaries,small increase in f0 and amplitude I.e. 346 repairs Reparanda: 73% end in fragments 30% end in glottalization (short, no decrease in energy or f0) co-articulatory gestures DI: signif diffs in duration of pause but can’t be used (alone) to predict disfluencies as too many false positive small increase in f0 and amplitude across to repair Repairs: offsets occur at phrase boundaries differences in phrasing compared to fluent speech CART prediction of repairs: 86% precision, 91% recall (192 of 223 interruption sites, 19 false positives). Features imp: dur of interva, fragment, pause filler, pos, lexical matching across DI. 11/23/2018

Does it identify self-repairs reliably? CART prediction: 86% precision, 91% recall Duration of interval, presence of fragment, pause filler, p.o.s., lexical matching across DI Are there edit signals? 11/23/2018

Next Week Spoken Dialogue Systems Andy, David and Vera reporting 11/23/2018