AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence,

Slides:

Advertisements

Similar presentations

Automatic Grapheme-Phoneme Conversion for Spoken British English Corpora C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université

Advertisements

Anaphoric Third Person Pronouns and Prosodic Features as Markers of Cohesion in English Spoken Discourse: A Corpus Study Cyril Auran Laboratoire Parole.

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

“Downstepped contours in the given/new distinction” Agustín Gravano Spoken Language Processing Group Columbia University, New York On the Role of Prosody.

Appositive Relative Clauses and their Prosodic Realization in Spoken Discourse: a Corpus Study of Phonetic Aspects in British English Cyril Auran & Rudy.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.

Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.

: Recognition Speech Segmentation Speech activity detection Vowel detection Duration parameters extraction Intonation parameters extraction German Italian.

Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.

Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.

FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.

Chapter 4: Image Enhancement

1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.

INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.

Prosodic marking of appositive relative clause types in spoken discourse: pragmatic and phonetic analyses of a British English corpus Cyril Auran & Rudy.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Automatic and Data Driven Pitch Contour Manipulation with Functional Data Analysis Michele Gubian, Lou Boves Radboud University Nijmegen Nijmegen, The.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.

As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.

Face Model Fitting with Generic, Group-specific, and Person- specific Objective Functions Chair for Image Understanding and Knowledge-based Systems Institute.

Lecture 6 The Intonation Phonology Suprasegmental phonology Intonation

Efficacy of Computer-based Phonetic Training on Students’ Boundary Tone Zhang Yan, Nanjing University.

Learning of Word Boundaries in Continuous Speech using Time Delay Neural Networks Colin Tan School of Computing, National University of Singapore.

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

August Discourse Structure and Anaphoric Accessibility Massimo Poesio and Barbara Di Eugenio with help from Gerard Keohane.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

1 Determining query types by analysing intonation.

Annotating the HKCSE Pragmatically Martin Weisser Visiting Professor School of English and Education Guangdong University of Foreign Studies mail:

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

HYMES (1964) He developed the concept that culture, language and social context are clearly interrelated and strongly rejected the idea of viewing language.

The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

A Fully Annotated Corpus of Russian Speech

National Taiwan University, Taiwan

INTONATION (Chapter 17).

Discourse & Dialogue CS 359 November 13, 2001

Predicting Voice Elicited Emotions

Salerno- AISV ROUND TABLE - PROSODY Giovanna Marotta Università di Pisa.

Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.

Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]

1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

Studying Intonation Julia Hirschberg CS /21/2018.

Meanings of Intonational Contours

Studying Intonation Julia Hirschberg CS /21/2018.

Intonational and Its Meanings

Intonational and Its Meanings

The American School and ToBI

Meanings of Intonational Contours

Representing Intonational Variation

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

“Downstepped contours in the given/new distinction”

Discourse Structure in Generation

Recognizing Structure: Dialogue Acts and Segmentation

Automatic Prosodic Event Detection

Presentation transcript:

AUTOMATIC DETECTION OF REGISTER CHANGES FOR THE ANALYSIS OF DISCOURSE STRUCTURE Laboratoire Parole et Langage, CNRS et Université de Provence Aix-en-Provence, France Céline De Looze 1

Local vs. global pitch characteristics → Bolinger (1951) Local: changes in the phonological representation of intonation Global: variations in register key (level) and span (range) Narrow span 2 1 Expanded span Higher key 2 1 Lower key 4 3 → Trager (1957)

Local vs. global pitch characteristics → Functional aspect of local and global pitch variations → Register variations in intonation systems ToBI (Pierrehumbert, 1980): binary phonological distinction (H&L tones) INTSINT (Hirst & Di Cristo, 1998): 8 possible tonal values where H & L tones are interpreted with respect to the previous tone or with respect to the speaker’s register 3 make the crucial assumption that the speaker's key and range remain constant.

Overview 4 ADoReVA Predicting topic changes through automatic detection of register variations Topic changes as reflected by register variations

ADoReVA 5 Automatic Detection of Register Variations Algorithm A clustering algorithm: represents through a binary tree structure the way units are grouped together according to their differences in register key and range Correlation with functional annotation A Praat Plugin

ADoReVA Calculate Register differences… 6 Calculates the difference between two consecutive units for key parameter = sqrt( log2(median_unit) – log2(median_prevUnit))^2 Calculates the difference between two consecutive units for range parameter = sqrt( log2(max/min_unit) – log2(max/min_prevUnit))2 Recursively reduces the Euclidian distance between two consecutive units in a space defined by key and span parameters = sqrt( (diffkey)^2+(diffrange)^2)

ADoReVA Calculate Register differences… 7 The detection of register key and range is done after the deletion of micro-prosodic effects thanks to the formulae Which quantiles from q05 to q95 are best correlated with manual annotations of pitch extrema? (De Looze & Hirst, 2007) - floor = q25* ceiling = q75*1.75

ADoReVA To Clustering tree… 8 The clustering algorithm groups units according to their difference in key and range. The smaller the difference between two units, the sooner these units are branched together.

ADoReVA To Clustering tree… 9 The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram Hierarchical structure

ADoReVA To Clustering tree… 10 The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram Relational Organisation

ADoReVA To Clustering tree… 11 The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram

ADoReVA To Clustering tree… 12 The output generated by the algorithm is a binary tree structure in the form of a layered icicle diagram

ADoReVA Calculate Node Distances… 13 Calculate node distances between the leaves (or units) of the tree and correlate them (within a table) with manual annotation functions. To Stat Analyses…

Topic changes as reflected by register changes 14 Are large differences in register between two consecutive units correlated with topic changes? Are large node distances between two leaves correlated with topic changes? Topic changes

Topic changes as reflected by register changes 15 Register variations throw light on the informational organisation of the discourse structure: →The information weight carried out by the discourse element → The hierarchical dimension and relational organisation of linguistic units Litterature reports: Lehiste, 1970, Brazil, 1980; Menn & Boyce, 1982; Kutik et al, 1983; Hirschberg & Pierrehumbert 1986 ; Thorsen, 1986; Nakajima & Allen, 1992;; Sluijter & Terken, 1993; Arons, 1994; Nicolas & Hirst, 1995; Fon, 2002; Kong, 2004; Chiu-yu et al, 2005; Mayer et al, 2006; denOuden et al, 2009 High and expanded register signals → Introduction of a new topic or topic change → Discourse element carrying new information → Elements at the beginning of the utterance → …

Topic changes as reflected by register changes 16 Litterature reports: Low and compressed register signals → Final parts of the utterance → Topic continuity → sub-topics, parenthetical comments → … Lehiste, 1970, Brazil, 1980; Menn & Boyce, 1982; Kutik et al, 1983; Hirschberg & Pierrehumbert 1986 ; Thorsen, 1986; Nakajima & Allen, 1992;; Sluijter & Terken, 1993; Arons, 1994; Nicolas & Hirst, 1995; Fon, 2002; Kong, 2004; Chiu-yu et al, 2005; Mayer et al, 2006; denOuden et al, 2009

Topic changes as reflected by register changes 17 Detection of topic changes through detection of large node distances Assumption Informing about declination/ final lowering: what temporal span?

Corpora PFC Corpus : 30 minutes of read speech from 10 French-native speakers (Delais-Roussarie & Durand, 2003) PAC Corpus: 30 minutes of read speech from 8 English-native speakers ( CID corpus : 40 minutes of dialogue from 8 French-native speakers (Bertrand et al, 2007) Aix-Marsec Corpus: 30 minutes of dialogue from 9 English-native speakers (Auran et al, 2004) 18

Functional Annotation A simplified version of Grosz & Sidner (1986) as used in Fon (2002) and Kong (2004) DSP2, DSP1, DSP0 between prosodic words → DSP0: no discourse boundary/ related units → DSP1: hierarchically superior relation between units/ but still share related purposes (cause-effect/ clarifying relationship) → DSP2: no related discourse purposes or topics 19

Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure Correlated with topic changes/ DSP2 annotation

Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure 21 Range is not always involved in signaling topic changes. Both Key and Range Aix-Marsec Corpus (dialogue speech) Key: F(2, 3446)=146.3, p-val< 2.2e-16 Range: F(2, 3446)=23.98, p-val: 4.549e-11 Range less than key French Corpora (read and dialogue speech) Key: F(2, 2398)=142, p-val< 2.2e-16 Range: F(2, 2398)=6.233, p-val: Not range PAC Corpus (read speech) Key: F(2, 3003) = 67.26, p-value: < 2.2e-16 Range: F(2, 3003) =0.1469, p-value =

Preliminary Results Higher and expanded register Large differences in key and range or Large Euclidian distances Large node distances in the binary tree structure 22 Range is not always involved in signaling topic changes. Speaking styles? Lively speech marked with variations in range

Preliminary Results 23 Range is not correlated with DSP1 annotation Cause-effect/ clarifying relationship between two consecutive units may be signaled with modifying key only

Preliminary Results 24 Key appears as a stable parameter while range may be optional to indicate topic changes Variations in range may be seen as marking a speaker’s involvment while telling his/her story Key and range parameters convey different functions and have to be studied separatly

Prediction 25 Predicting topic changes through automatic detection of register variations Confusion matrices : → 6 Features: key/ range differences in key/range node distances for key/range → 2 Classes: DSP0, DSP1/ DSP2

Prediction 26 Prediction with features key/ difference in key and node distance for key → gives better results than range, difference in range and node distances range.

Prediction 27 Prediction with both features → key and difference in key or ScoresRecallPrecisionF-Measure cat Key & diffkey Key feature DiffKey feature ScoresRecallPrecisionF-Measure cat ScoresRecallPrecisionF-Measure cat ScoresRecallPrecisionF-Measure cat NodDK feature Key & NodDK ScoresRecallPrecisionF-Measure cat → key and node distance for key slightly improve the detection of topic changes

Prediction 28 Higher scores of prediction for dialogue speech than read speech → between 20-30% predicted for read speech → about 40% predicted for dialogue speech

Discussion 29 Objective detection of register variations vs. subjective annotation of topic changes Detection of other functions than topic changes as reflected by register variations Detection of topic changes through automatic detection of - Tempo variations (pause & speaking rate) - Intensity variations

Discussion 30 Usefulness of the algorithm Better understanding of the hierarchical and organisational structure of discourse How do units fit together?

Conclusion 31 ADoReVA An algorithm to understand the structure of speech as reflected by register variations An algorithm to be implemented into intonation systems to improve the phonological representation of intonation (INTSINT: Detection of Top/Mid/Bottom taking into account register variations) Testing different units Subjective annotation vs. objective detection A graphical representation to serve pre-analysis

32 Merci

33