6/28/20151 Predicting Phrasing and Accent Julia Hirschberg.

Slides:

Advertisements

Similar presentations

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.

4/29/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.

Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.

Prosodic facilitation and interference in the resolution of temporary syntactic closure ambiguity Kjelgaard & Speer 1999 Kent Lee Ψ 526b 16 March 2006.

Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.

IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

Making & marking text for synthesis Caroline Henton 10 August 2006.

Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)

Parts of Speech ITSW 1410 Presentation Media Software Instructor: Glenda H. Easter.

6/10/20151 Predicting Phrasing and Accent Julia Hirschberg CS 4706.

CS 4705 Lecture 1 CS4705 Introduction to Natural Language Processing.

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

6/14/20151 Speech Synthesis: Then and Now Julia Hirschberg CS 4706.

CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

CS 4705 Lecture 21 Algorithms for Reference Resolution.

CS 4705 Natural Language Processing What is Natural Language Processing? The study of human languages and how they can be represented computationally.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Creating and Identifying

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Intonation and Information Discourse and Dialogue CS359 October 16, 2001.

CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

1. Information Conveyed by Speech 2. How Speech Fits in with the Overall Structure of Language TWO TOPICS.

Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,

Intonation in Communication Skill: Recent Research Discourse, both in theoretical linguistics and in foreign language pedagogy,has focused on describing.

CS 6961: Structured Prediction Fall 2014 Course Information.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Thinking & Language Ms. Kamburov. Automatic vs. Effortful Processing Automatic Effortful O Barely noticing what you are doing as you do it, taking little.

1 Computation Approaches to Emotional Speech Julia Hirschberg

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

CS 4705 Natural Language Processing Who am I? Julia Hirschberg –Computational Linguist in CS –Focus: Spoken Language Processing –Lab: The Speech Lab,

Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Suprasegmental features and Prosody Lect 6A&B LING1005/6105.

INTONATION And IT’S FUNCTIONS

Improving a Pipeline Architecture for Shallow Discourse Parsing

Studying Intonation Julia Hirschberg CS /21/2018.

Meanings of Intonational Contours

Representing Intonational Variation

Studying Intonation Julia Hirschberg CS /21/2018.

Intonational and Its Meanings

Intonational and Its Meanings

The American School and ToBI

Meaningful Intonational Variation

Speech Generation: From Concept and from Text

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

Meanings of Intonational Contours

Representing Intonational Variation

Advanced NLP: Speech Research and Technologies

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Predicting Phrasing and Accent

Advanced NLP: Speech Research and Technologies

Predicting Phrasing and Accent

Intonational and Its Meanings

Chunk Parsing CS1573: AI Application Development, Spring 2003

CS249: Neural Language Model

Presentation transcript:

6/28/20151 Predicting Phrasing and Accent Julia Hirschberg

Demos Current schedule for individual demos –8/25 2pmwill 2:30pm karl –8/28 3:30 Ros –Lambbook? Everyone else: Please make sure your system can be run by me. Have a non-team leader run in the team leader’s $HOMEDIR to make sure permissions are set correctly and paths are specified.

Today Motivation and issues Approaches: hand-built vs. corpus-based rules Predicting phrasing Predicting accent Future research: emotion, personalization, personality 6/28/20153

4 Synthesizing the News A car bomb attack on a police station in the northern Iraqi city of Kirkuk early Monday killed four civilians and wounded 10 others U.S. military officials said. A leading Shiite member of Iraq's Governing Council on Sunday demanded no more "stalling" on arranging for elections to rule this country once the U.S.-led occupation ends June 30. Abdel Aziz al-Hakim a Shiite cleric and Governing Council member said the U.S.-run coalition should have begun planning for elections months ago. -- LoquendoLoquendo

How do we assign intonation in TTS? Take a sentence like: Today the world lost a major talent – Amy Winehouse. Or You are clearly just totally insane, dude! Or Do you want to leave from Dallas or Denver? Creating/finding appropriate prosody in TTS

6/28/20156 Why is Intonation Assignment Important? TTS and CTS –Naturalness –Intelligibility –Acceptability

6/28/20157 What are the indicators of phrasing in speech? Timing –Pause –Lengthening F0 changes Vocal fry/glottalization

6/28/20158 Problems Assigning Phrasing Traditional hand-built rules –Use punctuation: , New York, NY –Context/function word: no breaks after function word: He went to dinner. He came to and went to dinner. –Syntax: She favors the nuts and bolts approach. She went home and Dave stayed.

6/28/20159 What are the indicators of accent in speech? F0 excursion Durational lengthening Voice quality Vowel quality Loudness

6/28/ Problems Assigning Accent Traditional Hand-built rules –Function/content distinction He went out the back door He threw out the trash –Complex nominals: Main Street/Park Avenue city hall parking lot (stress shift) German teachers

What are the indicators of contour in speech? Sequence of pitch accents, phrase accents and boundary tones Predicing phrasing and accent – which comes first?

6/28/ Problems Assigning Contours Simple rules –‘.’ = declarative contour –‘?’ = yes-no-question contour unless wh-word present at/near front of sentence Well then, how did he do it? And what do you know? –What should we do with ‘!’ –How can we assign other contours from text input? –Why should we try?

6/28/ How can we improve intonation assignment from text analysis? Simplest rules –Accent content words, deaccent function words –Use punctuation to determine phrasing and sentence-final prosody –Limitations Doesn’t work well, especially for longer sentences Hard to add new rules – consider consequences for entire system: Bell Labs ‘Frend’ Current solution: Corpus-based approaches (Sproat et al ’92) –Train prosodic variation on large hand-labeled corpora using machine learning techniques

6/28/ –Accent and phrasing decisions trained separately Binary prediction Feat1, Feat2,…Accent Feat1, Feat2,…Boundary –Associate prosodic labels with simple features of transcripts that can be automatically computed distance from beginning or end of phrase orthography: punctuation, paragraphing part of speech, constituent information –Apply automatically learned rules when processing text

6/28/ Statistical learning methods Classification and regression trees (CART) Rule induction (Ripper), Support Vector Machines, HMMs, Neural Nets All take vector of independent variables and one dependent (predicted) variable, e.g. ‘there’s a phrase boundary here’ or ‘there’s not’ Feat1 Feat2 …FeatN DepVar Input from hand labeled dependent variable and automatically extracted independent variables Result can be integrated into TTS text processor

6/28/ Reminder: Prosodic Phrasing 2 `levels’ of phrasing in ToBI –intermediate phrase: one or more pitch accents plus a phrase accent (H- or L- ) –intonational phrase: one or more intermediate phrases + boundary tone (H% or L% ) ToBI break-index tier –0 no word boundary –1 word boundary –2 strong juncture with no tonal markings –3 intermediate phrase boundary –4 intonational phrase boundary

6/28/ What linguistic and contextual features are linked to phrasing? Syntactic information –Abney ’91 chunking major constituents –Steedman ’90, Oehrle ’91 CCGs … –Which ‘chunks’ tend to stick together? –Which ‘chunks’ tend to be separated intonationally? Largest constituent dominating w(i) but not w(j) NP[The man in the moon] |? VP[looks down on you] Smallest constituent dominating w(i),w(j) NP[The man PP[in |? moon]] –Part-of-speech of words around potential boundary site The/DET man/NN |? in/Prep moon/NN Sentence-level information –Length of sentence

6/28/ This is a very |? very very long sentence ?| which thus might have a lot of phrase boundaries in?| it ?| don’t you think? This |? isn’t. Orthographic information –They live in Butte, ?| Montana, ?| don’t they? Word co-occurrence information Vampire ?| bat …powerful ?| but benign… Are words on each side accented or not? The cat in |? the Where is the most recent previous phrase boundary? He asked for pills | but |? What else?

6/28/ Integrating More Syntactic Information Incremental improvements continue: –Adding higher-accuracy parsing (Koehn et al ‘00) Collins ‘99 parser Different learning algorithms (Schapire & Singer ‘00) Different syntactic representations: relational? Tree-based? Ranking vs. classification? Rules always impoverished Where to next?

6/28/ Predicting Pitch Accent Accent: Which items are made intonationally prominent and how? Accent type: –H*simple high(declarative) –L*simple low(ynq) –L*+Hscooped, late rise (uncertainty/ incredulity) –L+H*early rise to stress(contrastive focus) –H+!H*fall onto stress (implied familiarity)

6/28/ What phenomena are associated with accent? Word class: content vs. function words Information status: –Given/new He likes dogs and dogs like him. –Topic/Focus Dogs he likes. –Contrast He likes dogs but not cats. Grammatical function –The dog ate his kibble. Surface position in sentence: Today George is hungry.

6/28/ Association with focus: –John only introduced Mary to Sue. Semantic parallelism –John likes beer but Mary prefers wine. Other semantic phenomena? How many of these are easy to compute automatically?

6/28/ How can we approximate such information? POS window Position of candidate word in sentence Location of prior phrase boundary Pseudo-given/new Location of word in complex nominal and stress prediction for that nominal City hall, parking lot, city hall parking lot Word co-occurrence Blood vessel, blood orange

6/28/ How do we evaluate the result? How to define a Gold Standard? –Natural speech corpus –Multi-speaker/same text –Subjective judgments No simple mapping from text to prosody –Many variants can be acceptable The car was driven to the border last spring while its owner an elderly man was taking an extended vacation in the south of France.

6/28/ Current Research Concept-to-Speech (CTS) –Systems should be able to specify “better” prosody: the system knows what it wants to say and can specify how Information status –Given/new –Topic/focus –Preposition/particle distinction –PP attachment and phrasing

6/28/ Future Intonation Prediction: Beyond Phrasing and Accent Assigning contour – how? Assigning affect (emotion) from text – how? Personalizing TTS: modeling individual style in intonation – how? Conveying personality, charisma – how?

6/28/ Next Class Information status: focus and given/new information