Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC 35900-1 October 11, 2006.

Similar presentations


Presentation on theme: "Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC 35900-1 October 11, 2006."— Presentation transcript:

1 Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC 35900-1 October 11, 2006

2 Roadmap Recognizing discourse structure in speech Analyzing spoken monologue Automatic topic segmentation –Acoustic cues, text cues, and integration Conclusions & Plans

3 Recognizing Discourse Structure Hypothesis: –Discourse can be decomposed into subunits Formal written text –Clues to structure: paragraphs, chapters, sections Spoken discourse –Lacks orthographic cues –Are compensating features available?

4 Prosody & Discourse Structure Discourse structure model –Grosz&Sidner 1986 –Global structure: discourse segments, embedding –Local structure: prominence, salience Linguistic structure includes intonation –Signal global or local structure Use of phrases to signal global structure Signal parenthetical

5 Intonational Features Theoretical framework –Tone and Break Index (ToBI, Pierrehumbert) Tone: pitch contours; Breaks: phrase units “Intermediate” phrases are basic units Features: Pitch range within and between phrases Amplitude (loudness) Pitch contour type Speaking rate (syll/sec) Inter-phrase pause duration

6 Speech Corpora Vary on: –Speaker type: professional/not –Speaking style: read/spontaneous –Speech content: news/directions/etc Variability in prosody too….

7 Pilot Study I: Newswire Professionally read 3 AP newswire stories Manual segmentation: Text only, Speech –Consensus labels: SB, SF Correlation of pitch range, amplitude, rate –Can identify structure via hand-labelings Issues: –Difficulty labeling, Idiosyncratic BN speech

8 Pilot Study II: Prominence and Discourse Prominence: Accent/stress on a word –Typically associated with NEW information –Contrast: Locally NEW (in segment) vs Globally NEW Analyze all NPs in 20 min spontaneous Difference in position and form influence –Full forms accented, pronouns etc not –Mismatches: Imply role of global/local Issues: –Difficulty labeling; use of full names or pronouns

9 Direction-giving Corpus Spontaneous/read speech; non-professional –Task-oriented: give directions, vary complexity Return later to read original transcriptions Discourse segment labeling: Text vs Speech –More consensus labels for speech than text Speech allows more reliable segmentation Spontaneous more reliable than read (medial)

10 Acoustic Analysis Features: Max/mean f0 (pitch), amplitude, rate, pause (pre/post) Findings: Segment beginnings: Higher max/mean f0, amplitude –Shorter following pause (Longer preceding pause in read) Segment endings: Lower max/mean f0, amplitude Similar for T & S annotations Issues: Single speaker

11 Prominence and Discourse NPs annotated for: –Lexical form (full NP/pron), grammatical role, surface position (sent/phrase), accent –23% reduced stress Effect of form, role Repetition, not necessarily reduced –Also find reduced forms in contrasts

12 Summary Clear prosodic cues to discourse structure –Across speakers, speaking style, content –Initiation: High max/average pitch, amplitude; preceding pause –Finality is converse Information status –Few clear correlates with accentuation Mediated by form, grammatical role

13 Prosodic and Lexical Cues to Topic Segmentation Broadcast news story-level segmentation –Television and radio Contrast w/GHN –Fully automatic: transcription, prosodic labeling –Large data set- multiple speakers –All teleprompted news

14 Possible Signals Lexical topic similarity in vector space –Hearst (1994) Lexical discourse cues (Beeferman et al) E.g. “CNN “ – Reporter sign-off –HMM topic model Prosodic cues –Pitch, loudness, duration, speaker change, …

15 Basic Approach Chop audio stream into “sentences” Group “sentences” into topics Classify each sentence boundary as topic boundary or not Probabilistic framework –argmax B Pr(B|W,F) B is sequence of boundaries, W words, F features

16 Prosodic Classification Features: –Pitch (f0) – before and after possible boundary, –Duration – final phoneme, final rhyme, pause No amplitude – viewed as redundant with pitch Classifier: Decision trees –Features selected by wrapper loop on training

17 Lexical Classification HMM topic language models –Train one model per topic –Begin/End state Train on previous topics Later augment with Topic Boundary states

18 Integrating Models With decision trees: –Incorporate HMM topic boundary probability as additional feature –Boundary labeled if exceeds some threshold With HMMs: –Use prosodic trees to estimate likelihoods –Use standard Viterbi decoding to find best

19 Testing & Evaluation Based on 6 shows –104 shows used for training Used ASR output for words/positions –Contrast with correct forced alignment Used manual speaker segmentation Bizarre cost metric Basic units: Chop at 0.572 sec pause

20 Decision Tree Classification Prosody-only features: –Pause duration, F0 difference, speaker change, gender Consistent with GHN Gender? Different styles for males/females Combined: –HMM LM likelihoods, pause, F0 difference

21 Best Results Integrate prosody and lexical cues HMM-based model combination better –Decision tree thresholding inconsistent Improves over HMM classifier only

22


Download ppt "Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC 35900-1 October 11, 2006."

Similar presentations


Ads by Google