5/5/20151 Recognizing Metadata: Segmentation and Disfluencies Julia Hirschberg CS 4706.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Speech and Language Processing Chapter 10 of SLP Advanced Automatic Speech Recognition (II) Disfluencies and Metadata.
® Towards Using Structural Events To Assess Non-Native Speech Lei Chen, Joel Tetreault, Xiaoming Xi Educational Testing Service (ETS) The 5th Workshop.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Improving out of vocabulary name resolution The Hanks David Palmer and Mari Ostendorf Computer Speech and Language 19 (2005) Presented by Aasish Pappu,
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Prosody-Based Automatic Segmentation of Speech into Sentences and Topics Elizabeth Shriberg Andreas Stolcke Speech Technology and Research Laboratory.
National Taiwan University, Taiwan
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Natural conversation “When we investigate how dialogues actually work, as found in recordings of natural speech, we are often in for a surprise. We are.
CS 4705 Corpus Linguistics and Machine Learning Techniques.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Non-Monotonic Parsing of Fluent Umm I mean Disfluent Sentences Mohammad Sadegh Rasooli[Columbia University] Joel Tetreault[Yahoo Labs] This work conducted.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Robust Semantics, Information Extraction, and Information Retrieval
Recognizing Disfluencies
Recognizing Structure: Dialogue Acts and Segmentation
Recognizing Structure: Sentence and Topic Segmentation
Spoken Dialogue Systems
Prosody in Recognition/Understanding
Automatic Speech Recognition
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Data Mining, Information Extraction and Search in Spoken Documents
Turn-taking and Disfluencies
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Turn-taking and Disfluencies
Data Mining, Information Extraction and Search in Spoken Documents
Recognizing Disfluencies
Discourse Structure in Generation
Recognizing Structure: Dialogue Acts and Segmentation
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

5/5/20151 Recognizing Metadata: Segmentation and Disfluencies Julia Hirschberg CS 4706

5/5/20152 DARPA EARS Program Effective, Affordable, Reusable Speech-to-Text Goals: Produce transcriptions that are more readable and usable by humans and downstream processes –Segment speech into text-like units Sentences, speaker turns, topics –Add appropriate punctuation –Eliminate spontaneous speech phenomena Filled pauses,Self-repairs,Discourse markers Tasks: ASR, Metadata Extraction (MDE)

5/5/20153 Motivation: ASR Transcription aides tonight in boston in depth the truth squad for special series until election day tonight the truth about the budget surplus of the candidates are promising the two international flash points getting worse while the middle east and a new power play by milosevic and a lifelong a family tries to say one child life by having another amazing breakthrough the u s was was told local own boss good evening uh from the university of massachusetts in boston the site of the widely anticipated first of eight between vice president al gore and governor george w bush with the election now just five weeks away this is the beginning of a sprint to the finish and a strong start here tonight is important this is the stage for the two candidates will appear before a national television audience taking questions from jim lehrer of p b s n b c’s david gregory is here with governor bush claire shipman is covering the vice president claire you begin tonight please

5/5/20154 Motivation: Speaker Segmentation (Diarization) Speaker: 0 - aides tonight in boston in depth the truth squad for special series until election day tonight the truth about the budget surplus of the candidates are promising the two international flash points getting worse while the middle east and a new power play by milosevic and a lifelong a family tries to say one child life by having another amazing breakthrough the u s was was told local own boss good evening uh from the university of massachusetts in boston Speaker: 1 - the site of the widely anticipated first of eight between vice president al gore and governor george w bush with the election now just five weeks away this is the beginning of a sprint to the finish and a strong start here tonight is important this is the stage for the two candidates will appear before a national television audience taking questions from jim lehrer of p b s n b c’s david gregory is here with governor bush claire shipman is covering the vice president claire you begin tonight please

5/5/20155 Motivation: Sentence Detection, Punctuation, Truecasing Speaker: Anchor - Aides tonight in Boston. In depth the truth squad for special series until election day. Tonight the truth about the budget surplus of the candidates are promising. The two international flash points getting worse. While the Middle East. And a new power play by Milosevic and a lifelong a family tries to say one child life by having another amazing breakthrough the U. S. was was told local own boss. Good evening uh from the university of Massachusetts in Boston. Speaker: Reporter - The site of the widely anticipated first of eight between Vice President Al Gore and Governor George W. Bush. With the election now just five weeks away. This is the beginning of a sprint to the finish. And a strong start here tonight is important. This is the stage for the two candidates will appear before a national television audience taking questions from Jim Lehrer of PBS. NBC's David Gregory is here with Governor Bush. Claire Shipman is covering the vice president. Claire, you begin tonight please.

5/5/20156 Story Boundary/Topic Detection Speaker: Anchor - Aides tonight in Boston. In depth the truth squad for special series until election day. Tonight the truth about the budget surplus of the candidates are promising. The two international flash points getting worse. While the Middle East. And a new power play by Milosevic and a lifelong a family tries to say one child life by having another amazing breakthrough the U. S. was was told local own boss. Good evening uh from the university of Massachusetts in Boston. Speaker: Reporter - The site of the widely anticipated first of eight between Vice President Al Gore and Governor George W. Bush. With the election now just five weeks away. This is the beginning of a sprint to the finish. And a strong start here tonight is important. This is the stage for the two candidates will appear before a national television audience taking questions from Jim Lehrer of PBS. NBC's David Gregory is here with Governor Bush. Claire Shipman is covering the vice president. Claire, you begin tonight please.

5/5/20157 Today Segmentation –Speakers –Sentences –Stories Disfluency detection/correction –Self-Repairs

5/5/20158 Speaker Diarization Assign consistent speaker labels across a meeting or news broadcast Speaker1… Speaker2… Speaker1… Speaker4… Segment spoken document into acoustically distinct units Cluster and assign identifiers to each instance in the document

Sentence Segmentation Classification task: sentence boundary vs. no sentence boundary Features: –Lexical and POS information (but ASR is noisy) –Distance from previous hypothesized boundary –Speech information Durations (sentence-final words are longer) Pause F0 (f0 modeling, pitch reset, pitch range) Hand-annotated training corpus, annotated for SLUs (Sentence-like Units): 51pp LDC ManualLDC Manual 5/5/20159

Punctuation Detection and Truecasing Punctuation: –Assign each SLU an appropriate final punctuation –SLU-internal punctuation? Capitalization: –Capitalize words beginning sentences –Named Entities – how train? Features: –Prosodic and lexical Training Data? 5/5/201510

Topic/Story Boundary Detection Rich text-based literature –Halliday & Hasan 1976: lexical cohesion –Hearst 1997: TextTiling segments by comparing words before and after each hypothesized topic boundary wrt a word similarity metric –Reynar, 1999; Beeferman et al 1999: cue phrases –Choi 2000: divisive clustering using cosine sim on stems Features used: –Stem repetition, entity repetition, word frequency, context vectors, semantic similarity, word distance, lexical chains, anaphoric chains 5/5/201511

5/5/ Spoken Cues to Discourse/Topic Structure Pitch range Lehiste ’75, Brown et al ’83, Silverman ’86, Avesani & Vayra ’88, Ayers ’92, Swerts et al ’92, Grosz & Hirschberg’92, Swerts & Ostendorf ’95, Hirschberg & Nakatani ‘96 Preceding pause Lehiste ’79, Chafe ’80, Brown et al ’83, Silverman ’86, Woodbury ’87, Avesani & Vayra ’88, Grosz & Hirschberg’92, Passoneau & Litman ’93, Hirschberg & Nakatani ‘96

5/5/ Rate Butterworth ’75, Lehiste ’80, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Amplitude Brown et al ’83, Grosz & Hirschberg’92, Hirschberg & Nakatani ‘96 Contour Brown et al ’83, Woodbury ’87, Swerts et al ‘92

Finding Sentence and Topic/Story Boundaries in ASR Transcripts Shriberg et al 2000 Text-based segmentation is fine…if you have reliable text Could prosodic cues perform as well or better at sentence and topic segmentation in ASR transcripts? – more robust? – more general? Goal: identify sentence and topic boundaries at ASR-defined word boundaries –CART decision trees and LM –HMM combined prosodic and LM results 5/5/201514

Features Trained/tested on Switchboard and Broadcast News For each potential boundary location: Pause at boundary (raw and normalized by speaker) Pause at word before boundary (is this a new ‘turn’ or part of continuous speech segment?) Phone and rhyme duration (normalized by inherent duration) (phrase-final lengthening?) F0 (smoothed and stylized): reset, range (topline, baseline), slope and continuity Voice quality (halving/doubling estimates as correlates of creak or glottalization) Speaker change, time from start of turn, # turns in conversation and gender 5/5/201515

Sentence Segmentation Results Prosodic only model –Better than LM for BN –Worse (on hand transcription) and same (for ASR transcript) on SB –Slightly improves LM on SB Useful features for BN –Pause at boundary, turn change/no turn change, f0 diff across boundary, rhyme duration Useful features for SB –Phone/rhyme duration before boundary, pause at boundary, turn/no turn, pause at preceding word boundary, time in turn 5/5/201516

5/5/ Topic Segmentation Results (BN only): Useful features –Pause at boundary, f0 range, turn/no turn, gender, time in turn Prosody alone better than LM Combined model improves significantly

5/5/ Story Segmentation on BN Rosenberg et al ’07 Goal: Divide each show into homogenous regions, each about a single topic –Task: Focused Q/A –Issue: What unit of analysis should we use in identifying potential boundaries?

5/5/ TDT-4 Corpus English: hours, 250 broadcasts, 6 shows Arabic: 88.5 hours, 109 broadcasts, 2 shows Mandarin: 109 hours, 134 broadcasts, 3 shows Manually annotated story boundaries ASR Hypotheses Speaker Diarization Hypotheses

5/5/ Approach Identify set of segments which define: –Unit of analysis –Candidate boundaries Classify each candidate boundary based on features extracted from segments –C4.5 Decision Tree –Model each show-type separately E.g. CNN “Headline News” and ABC “World News Tonight have distinct models –Evaluate using WindowDiff with k=100

5/5/ Segment Boundary Modeling Features Acoustic –Pitch & Intensity speaker normalized min, mean, max, stdev, slope –Speaking Rate vowels/sec, voiced frames/sec –Final Vowel, Rhyme Length –Pause Length Lexical –TextTiling scores –LCSeg scores –Story beginning and ending keywords Structural –Position in show –Speaker participation –First or last speaker turn?

5/5/ Input Segmentations ASR Word boundaries –No segmentation baseline Hypothesized Sentence Units –Boundaries with 0.5, 0.3 and 0.1 confidence thresholds Pause-based Segmentation –Boundaries at pauses over 500ms and 250ms Hypothesized Intonational Phrases

5/5/ Hypothesizing Intonational Phrases ~30 minutes manually annotated ASR BN from reserved TDT-4 CNN show. –Phrase was marked if a phrase boundary occurred since the previous word boundary. C4.5 Decision Tree Pitch, Energy and Duration Features –Normalized by hypothesized speaker id and surrounding context 66.5% F-Measure (p=.683, r=.647)

5/5/ Story Segmentation Results

5/5/ Input Segmentation Statistics Mean Distance to Nearest True Boundary (words) Exact Story Boundary Coverage (pct.) Segment to True Boundary Ratio

5/5/ Results Best Performance: –Low threshold (0.1) sentences –Short pause (250ms) segmentation –Hyp. IPs perform better than sentences. –Would increased SU, IP accuracy improve story segmentation? External evaluation: impact on IR and MT performance.

5/5/ Disfluencies and Self-Repairs Spontaneous speech is ‘ungrammatical’ –every 4.6s in radio call-in (Blackmer & Mitton ‘91) hesitation: Ch- change strategy. filled pause: Um Baltimore. self-repair: Ba- uh Chicago. A big problem for speech recognition Ch- change strategy. --> to D C D C today ten fifteen. Um Baltimore. --> From Baltimore ten. Ba- uh Chicago. --> For Boston Chicago.

5/5/ Disfluencies as ‘Noise’ For people –Repairs as replanning events –Repairs as attention-getting devices (taking the turn) For parsers For speech recognizers

5/5/ What’s the Alternative? Modeling disfluencies –Filled pauses –Self-repairs –Hesitations Detecting disfluencies explicitly –Why is this hard? Distinguishing them from ‘real’ words (uh vs. a) Distinguishing them from ‘real’ noise

5/5/ Self-Repairs Hindle ’83: –When people produce disfluent speech and correct themselves…. –They leave a trail behind –Hearers can compare the fluent finish with the disfluent start This is a bad – a disastrous move ‘a/DET bad/ADJ’/’a/DET disastrous/ADJ’ –To determine what to ‘replace’ with what –Corpus: interview transcripts with correct p.o.s. assigned

5/5/ The ‘Edit Signal’ How do Hearers know what to keep and what to discard? Hypothesis: Speakers signal an upcoming repair by some acoustic/prosodic edit signal –Tells hearers where the disfluent portion of speech ends and the correction begins Reparandum – edit signal – repair What I {uh,I mean, I-,..} what I said is If there is an edit signal, what might it be? –Filled pauses –Explicit words –Or some ‘non-lexical’ acoustic phenomena

5/5/ Categories of Self Repairs Same surface string Well if they’d * if they’d… Same part-of-speech I was just that * the kind of guy… Same syntactic constituent I think that you get * it’s more strict in Catholic schools Restarts are completely different… I just think * Do you want something to eat?

5/5/ Hindle Category Distribution for 1 Interview 1512 sentences, 544 repairs CategoryN% Edit Signal Only12824% Exact Match16129% Same POS479% Same Syntactic Constituent 14827% Restart326% Other285%

5/5/ But is there an Edit Signal? Definition: a reliable indicator that divides the reparandum from the repair In search of the edit signal: RIM Model of Self-Repairs (Nakatani & Hirschberg ’94)RIM Model –Reparandum, Disfluency Interval (Interruption Site), Repair ATIS corpus –6414 turns with 346 (5.4%) repairs, 122 speakers, hand-labeled for repairs and prosodic features

5/5/201535

5/5/ Lexical Class of Word Fragments Ending Reparandum Lexical ClassN% Content words12843% Function words145% ?15652%

5/5/ Length of Fragments at End of Reparandum SyllablesN% % % 2258% 31.3%

5/5/ Length in Words of Reparandum LengthFragment Repairs (N=280) Non-Fragment Repairs (N=102) %5352% 26423%3332% 3186%99% 462%2 5 or more93%55%

5/5/ Type of Initial Phoneme in Fragment Class of First Phoneme % of All Words % of All Fragments % of 1-Syl Fragments % of 1-C Fragments Stop23%23$29%12% Vowel25%13%20%0% Fricative33%44%27%72% Nasal/glid e/liquid 18%17%20%15% H1%2%4%1% Total N64,

5/5/ Presence of Filled Pauses/Cue Phrases FP/Cue PhrasesUnfilled Pauses Fragment16264 Non-Fragment2082

5/5/ Duration of Pause MeanSDevN Fluent Pause513ms676ms1186 DI334ms421ms346 Fragment289ms377ms264 Non- Fragment 481ms517ms82

5/5/ Is There an Edit Signal? Findings: –Reparanda: 73% end in fragments, 30% in glottalization, co-articulatory gestures –DI: pausal duration differs significantly from fluent boundaries,small increase in f0 and amplitude Speculation: articulatory disruption Are there edit signals?

5/5/ With or Without an Edit Signal, How Might Hearers/Machines Process Disfluent Speech? Parsing-based approaches: (Weischedel & Black ’80; Carbonell & Hayes ’83; Hindle ’83; Fink & Biermann ’86): –If 2 constituents of identical semantic/syntactic type are found where grammar allows only one, delete the first –Use an ‘edit signal’ or explicit words as cues –Select the minimal constituent Pick up the blue- green ball.

5/5/ Results: Detection and correction –Trivial (edit signal only): 128 (24%) –Non-trivial: 388 (71%)

5/5/ –Find candidate self-repairs using lexical matching rules Exact repetitions within a window I’d like a a tall latte. A pair of specified adjacent items The a great place to visit. ‘Correction phrases’ That’s the well uh the Raritan Line. –Filter using syntactic/semantic information That’s what I mean when I say it’s too bad. Pattern-matching approaches (Bear et al ’92)

5/5/ Distribution of Reparanda –10,718 utterances –Of 646 repairs: Most nontrivial repairs (339/436) involve matched strings of identical words Longer matched string –More likely a repair More words between matches –Less likely repair –Distribution of reparanda by length in words  LenN% % % 3528% 4254% 5234% 6+163%

5/5/ Detection results: –201 ‘trivial’ (fragments or filled pauses) –Of 406 remaining: Found 309 correctly (76% Recall) Hypothesized 191 incorrectly (61% Precision) Adding ‘trivial’: 84% Recall, 82% Precision Correcting is harder: –Corrects all ‘trivial’ but only 57% of correctly identified non-trivial

5/5/ Machine Learning Approaches (Nakatani & Hirschberg ’94) CART prediction: 86% precision, 91% recall –Features: Duration of interval, presence of fragment, pause filler, p.o.s., lexical matching across DI –Produce rules to use on unseen data –But…requires hand-labeled data

5/5/ State of the Art (Liu et al 2002) Detecting the Interruption Point (IP) using acoustic/prosodic and lexical features Features: –Normalized duration and pitch features –Voice quality features: Jitter: perturbation in the pitch period Spectral Tilt: overall slope of the spectrum Open Quotient: ratio of time vocal folds open/total length of glottal cycle –Language Models: words, POS, repetition patterns

5/5/ I hope to have to have NP VBPREP VB PREP VB X X Start Orig2 IP Rep End Corpus: –1593 Switchboard conversations, hand-labeled –Downsample to 50:50 IP/not since otherwise baseline is 96.2% (predict no IP) Results: –Prosody alone produces best results on downsampled data (Prec. 77%, Recall 76%)

5/5/ –IP Detection: Precision/Recall Prosody+Word LM+POS LM does best on non-downsampled (Prec.57%, Recall 81%) –IP Detection: Overall accuracy Prosody alone on reference transcripts (77%) vs. ASR transcripts (73%) -- ds Word LM alone on reference transcripts (98%) vs ASR transcripts (97%) – non-ds –Finding reparandum start: Rule-based system (Prec. 69%, Recall 61%) LM (Prec. 76%, Recall 46%) Have we made progress?

5/5/ IP Detection Results Downsampled –Chance (Acc) –Prosody (P,R,A) Non-downsampled –Chance (A) –Prosody (A) –Word-LM (P,R,A) –POS-LM –Word-LM+Prosody –Word-LM+ Prosody+ POS-LM

5/5/ Next Class Spoken Dialogue Systems