Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Phrasing and Accent

Similar presentations


Presentation on theme: "Predicting Phrasing and Accent"— Presentation transcript:

1 Predicting Phrasing and Accent
Julia Hirschberg CS 4706 12/9/2018

2 Why worry about accent and phrasing?
A car bomb attack on a police station in the northern Iraqi city of Kirkuk early Monday killed four civilians and wounded 10 others U.S. military officials said. A leading Shiite member of Iraq's Governing Council on Sunday demanded no more "stalling" on arranging for elections to rule this country once the U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite cleric and Governing Council member said the U.S.-run coalition should have begun planning for elections months ago. -- Loquendo 12/9/2018

3 Why predict phrasing and accent?
TTS and CTS Naturalness Intelligibility Recognition Decrease perplexity Modify durational predictions for words at phrase boundaries Identify most ‘salient’ words Summarization, information extraction 12/9/2018

4 How do we predict phrasing and accent?
Default prosodic assignment from simple text analysis The president went to Brussels to make up with Europe. Doesn’t work all that well, e.g. particles Hand-built rule-based systems hard to modify or adapt to new domains Corpus-based approaches (Sproat et al ’92) Train prosodic variation on large hand-labeled corpora using machine learning techniques TTS default prosodic assignment developed to be independent of domains and tasks. Uses simple text analysis to vary phrasing, accent, possibly pitch range. While hand-built rule-sets are still used for particular application domains, most systems have moved toward automatically trained prosodic assignment systems. 12/9/2018

5 Accent and phrasing decisions trained separately
E.g. Feat1, Feat2,…Acc Associate prosodic labels with simple features of transcripts distance from beginning or end of phrase orthography: punctuation, paragraphing part of speech, constituent information Apply automatically learned rules when processing text 12/9/2018

6 Reminder: Prosodic Phrasing
2 `levels’ of phrasing in ToBI intermediate phrase: one or more pitch accents plus a phrase accent (H or L- ) intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier 0 no word boundary 1 word boundary 2 strong juncture with no tonal markings 3 intermediate phrase boundary 4 intonational phrase boundary 12/9/2018

7 What are the indicators of phrasing in speech?
Timing Pause Lengthening F0 changes Vocal fry/glottalization 12/9/2018

8 What linguistic and contextual features are linked to phrasing?
Syntactic information Abney ’91 chunking Steedman ’90, Oehrle ’91 CCGs … Which ‘chunks’ tend to stick together? Which ‘chunks’ tend to be separated intonationally? Largest constituent dominating w(i) but not w(j) [The man in the moon] |? [looks down on you] Smallest constituent dominating w(i),w(j) The man [in |? moon] Part-of-speech of words around potential boundary site Sentence-level information Length of sentence This is a very very very long sentence which thus might have a lot of phrase boundaries in it don’t you think? 12/9/2018

9 Orthographic information They live in Butte, Montana.
This isn’t. Orthographic information They live in Butte, Montana. Word co-occurrence information Vampire bat, …powerful but… Are words on each side accented or not? The cat in |? the Where is the last phrase boundary? He asked for pills | but |? What else? 12/9/2018

10 Statistical learning methods
Classification and regression trees (CART) Rule induction (Ripper) Support Vector Machines HMMs, Neural Nets All take vector of independent variables and one dependent (predicted) variable, e.g. ‘there’s a phrase boundary here’ or ‘there’s not’ Input from hand labeled dependent variable and automatically extracted independent variables Result can be integrated into TTS text processor 12/9/2018

11 How do we evaluate the result?
How to define a Gold Standard? Natural speech corpus Multi-speaker/same text Subjective judgments No simple mapping from text to prosody Many variants can be acceptable The car was driven to the border last spring while its owner an elderly man was taking an extended vacation in the south of France. 12/9/2018

12 More Recent Results Incremental improvements continue:
Adding higher-accuracy parsing (Koehn et al ‘00) Collins ‘99 parser Different learning algorithms (Schapire & Singer ‘00) Different syntactic representations: relational? Tree-based? Ranking vs. classification? Rules always impoverished Where to next? 12/9/2018

13 Predicting Pitch Accent
Accent: Which items are made intonationally prominent and how? Accent type: H* simple high (declarative) L* simple low (ynq) L*+H scooped, late rise (uncertainty/ incredulity) L+H* early rise to stress (contrastive focus) H+!H* fall onto stress (implied familiarity) 12/9/2018

14 What are the indicators of accent?
F0 excursion Durational lengthening Voice quality Vowel quality Loudness 12/9/2018

15 What phenomena are associated with accent?
Word class: content vs. function words Information status: Given/new He likes dogs and dogs like him. Topic/Focus Dogs he likes. Contrast He likes dogs but not cats. Grammatical function The dog ate his kibble. Surface position in sentence: Today George is hungry. 12/9/2018

16 Association with focus: John only introduced Mary to Sue.
Semantic parallelism John likes beer but Mary prefers wine. 12/9/2018

17 How can we capture such information simply?
POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal City hall, parking lot, city hall parking lot Word co-occurrence Blood vessel, blood orange 12/9/2018

18 Current Research Concept-to-Speech (CTS) – Pan&McKeown99
systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how New features Newer machine learning methods: Boosting and bagging (Sun02) Combine text and acoustic cues for ASR Co-training In principle, the information TTS systems lack to support natural prosodic assignment is readily available to CTS systems. So the initial hope in the NLG community was that prosodic assignment would be a simple problem. It’s proven however fairly hard. Why? 12/9/2018

19 MAGIC MM system for presenting cardiac patient data
Developed at Columbia by McKeown and colleagues in conjunction with Columbia Presbyterian Medical Center to automate post-operative status reporting for bypass patients Uses mostly traditional NLG hand-developed components Generate text, then annotate prosodically Corpus-trained prosodic assignment component To give an example of how more traditional NLG systems are coping with these problems, I want to briefly describe some work Shimei Pan has done for her doctoral thesis at Columbia University with Kathy McKeown and me on training a prosody module for a medical domain. 12/9/2018

20 Corpus: written and oral patient reports
50min multi-speaker, spontaneous + 11min single speaker, read 1.24M word text corpus of discharge summaries Transcribed, ToBI labeled Generator features labeled/extracted: syntactic function 12/9/2018

21 semantic ‘informativeness’ (rarity in corpus)
p.o.s. semantic category semantic ‘informativeness’ (rarity in corpus) semantic constituent boundary location and length salience given/new focus theme/ rheme ‘importance’ ‘unexpectedness’ Lessons I personally learned from this: First, just because CTS systems have more information doesn’t mean this information will be useful for prosodic assignment. Second, there is something unsatisfactory about viewing prosodic decisions being made after the text has been constructed. 12/9/2018

22 Very hard to label features
Results: new features to specify TTS prosody Of CTS-specific features only semantic informativeness (likeliness of occurring in a corpus) useful so far Looking at context, word collocation for accent placement helps predict accent RED CELL (less predictable) vs. BLOOD cell (more) Most predictable words are accented less frequently (40-46%) and least predictable more (73-80%) Unigram+bigram model predicts accent status w/77% (+/-.51) accuracy 12/9/2018

23 Future Intonation Prediction: Beyond Phrasing and Accent
Assigning affect (emotion) Personalizing TTS Conveying personality, charisma? 12/9/2018

24 Next Class Look at another phenomena we’d like to capture in TTS systems: Information status Homework 3a is due on March 1! 12/9/2018


Download ppt "Predicting Phrasing and Accent"

Similar presentations


Ads by Google