Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Model Prosody? The case of KIM – The Kiel Intonation Model Klaus J. Kohler IPDS, Kiel, Germany Paper given at Laboratoire Parole et Langage, Aix-en-Provence.

Similar presentations


Presentation on theme: "How to Model Prosody? The case of KIM – The Kiel Intonation Model Klaus J. Kohler IPDS, Kiel, Germany Paper given at Laboratoire Parole et Langage, Aix-en-Provence."— Presentation transcript:

1

2 How to Model Prosody? The case of KIM – The Kiel Intonation Model Klaus J. Kohler IPDS, Kiel, Germany Paper given at Laboratoire Parole et Langage, Aix-en-Provence 10 October, 2008

3 1Explaining the title of the presentation The main title is intended as a question –to be paraphrased as "The question is: How to model prosody?" –therefore the question mark If it were to be a statement about the contents "This paper describes how to model prosody." –it would be terminated by a period Without any punctuation it is ambiguous

4 Is this orthographic difference between ? and. also signalled prosodically? When you hear the following 2 readings, would you say that one is a question, and if so which? The first one is from the following context The second is from the following context What are the prosodic differences of the 2 versions? –certainly not rising (quest) vs falling (state) –as generally maintained in text books

5 How to model prosody. 

6 How to model prosody? 

7 To summarize –both end on falling pitch to the same low level –both are concatenations of peak contours, linked to 3 sentence accents on how, model, prosody –but the peak contours differ in 2 ways °in the question, the peak maxima are higher °and later in relation to the accented vowel onset ¶resulting in a rise into the accented vowel ¶vs. a fall into the accent for the statement –thus question highlights high, statement low pitch

8 What happens if context and phrase prosody are cross-spliced? –unacceptable? –or new functional interpretation? –statement context + question prosody °introduces insistence into the statement °"here you are going to find the answer" °or, pronounced by a banner bearer of another model, such as ToBI, it may express sarcasm "don't kid yourself"

9 –question context + statement prosody °shows lack of enthusiasm in finding an answer ° may introduce indecisiveness, resignation "the question will always remain" Thus, communiactive function of prosody must take verbal and situational context into account. So, to model prosody adequately we need to refer it to its functional setting and must pay attention to global pitch contours, produced and perceived, rather than local H and L. This is what KIM tries to do.

10 2Prerequisites to Prosodic Analysis (1)It must be theory-driven –applies to very few prosody models, certainly not to ToBI, which is data-driven par excellence, and is not even a model but a data processing tool but Yi Xu –at the outset of proper prosodic modelling, ad hoc observations and intuitions of the native speaker are turned into initial theoretical assumptions –from which hypotheses for systematic empirical investigation are derived –data analysis confirms or rejects the hypotheses

11 –rejection leads to further hypotheses and data analyses °introducing further conditioning factors, still fitting into initial theory but so far neglected °may eventually lead to adjustment of the theory or even a scientific revolution ¶but change of the theory is not done lightly ¶entails change of the basic theoretical assumptions in order to explain a wider spectrum of data

12 (2)Start from communicative function –again rare in existing models, e.g. Lund, Yi Xu –go beyond linguistic functions as they surface in formal categories of sentence mode, focus –include expressiveness of the speaker and attitudes towards the listener (3)give equal weight to production and perception –most models focus on production, but IPO, David House, Maria Paola d'Imperio

13 (4)pay attention to fine phonetic detail in auditory and instrumental analysis as carriers of functions –FPD usually relegated to instrumental level (5)new methodology of data acquisition –this is where models generally fail –we need data collection appropriate to research question –rote sentence reading and repetition will no longer do in the investigation of communicative functions of prosody

14 (6)inferential statistical analysis must be done in accordance with, and as part of, the experimental design of the research question –it should not be added a posteriori –it should not attempt to establish category boundaries, but the validity and generalizability of prosodic manifestations of speech functions –it needs to consider plausibility of productive and perceptual differentiation in speech communication –most ToBI-related modelling fails on this point

15 (7)It follows from these prerequisites to give up the dichotomies of linguistics vs paralinguistics, of phonetics vs phonology, and of segmentals vs prosodies in phonetic analysis, and to integrate the analysis into a framework of speech communication communicative phonetic science This is the aim of KIM.

16 3The Kiel Intonation Model 3.1Some historical notes on its development June 1980, Bill Barry's lecture to the Arts Faculty of Kiel University in partial fulfilment of his habilitation –Prosodic functions revisited again –published in Phonetica 38 (1981) °from the phonetic base to communicative functions °to consider function systematically in the analysis °to go beyond over-emphasized linguistic function

17 winter semester 1980/81, Robert Bannert, Lund at the time, gave a course on prosody at IPDS Kiel –attempt to apply the Lund Model to German ° word accents I & II in the tonal accent dialects of Scandinavia °their phrasal manifestation under focus, sentence mode in various phrase positions °Gösta Bruce's phrase accent, feature of the phrase °which was taken over by Pierrehumbert and ToBI and reinterpreted as a boundary tone

18 –Bannert aimed at describing the phonetic manifestation of phrase accents in German –collected production data with variations of the constructed sentence Der lullende Müller in Lingen will die längeren Männer in der Menge immer lungernde Lümmel nennen. "The peeing miller from Lingen will always call the longer men in the crowd lay-about rascals.

19 °reminiscent of Bruce's man vill lämna nåra långa nunnor. “one wants to leave some long nuns” (1977) °nonsense is still practiced a quarter of a century later Die Nonne und der Lehrer wollen der Lola in Murnau eine Warnung geben, und die Hanne will im November ein Lama malen. "The nun and the teacher want to give a warning to Lola from Murnau, and Hanna wants to paint a lama in November." (Truckenbrodt 2002)

20 °degree of nonsense actually increased over the years °reason is as clear as it is unacceptable to be able to trace continuous f0 through voiced sounds – vowels, laterals, nasals °analysis method determines speech material instead of vice versa

21 –rote-fashion reading with several repetitions °students attending the course °assumption that they would all realise the same underlying phonological accents relevant for the intonation of German °no consideration for potential production artefacts created by the odd material and the collection method, e.g. boredom, which changes pitch accent realization drastically –from such data, Bannert defined German phrase accents as rising f0 contours on stressed syllables

22 it was immediately obvious that these analyses did not shed light on the system of German intonation –accents need not have rising(-falling) pitch, they may be only falling, as my 2nd reading shows –but these differences code different meanings –so, function has to be the starting point –the data collection needs to be semantically plausible and contextualized –and the listener has to be brought in to decide on the categorization of acoustic measurement

23 So, we needed a model for German intonation, which we developed at Kiel in a series of stages (1)in winter semester 1982/83, Bill Barry and I gave a course at IPDS on Functions of intonation and their phonetic manifestations –semantics and pragmatics of different falling f0 patterns in German sentence accents (2)This led to a 5-year research project, supported by the German Research Council, starting in 1985 Form and Function of Intonation Peaks in German

24 two aims –function-related macroprosodic categories –microprosodic variation due to segmentals –perceptual test paradigm for categories shift of complete f0 contours through otherwise constant utterances "Sie hat ja gelogen." "She's been lying." –f0 peak contour shift from early via medial to late synchronization with the articulation of an accented syllable –discrimination of pitch changes –identification of meaning changes

25  70Hz 140Hz 85Hz 90Hz 110Hz

26 functional link of peak synchronization shown by contextualization for Sie hat ja gelogen. –early Wer einmal lügt, dem glaubt man nicht. "Once a liar, always a liar." –medial Jetzt versteh ich das erst. Now I understand. –late Oh! –Identification as +- matching in context

27 Identification in context “now I understand” Sie hat ja gelogen. "She's been lying." early > medial

28 peak synchronization is defined functionally therefore different from phonetic alignment in ToBI pragmatic functions of peak contour synchronization –early - finality °knowing °summarizing °coming to the end of an argument °resignation

29 –medial - openness °observing °realising °starting a new argument

30 –late - unexpectedness °observing, realising in contrast to one‘s expectation °surprise °disbelief

31 the characteristics of the 3 peak categories are –early – falling pitch into the accented vowel to low level, decreasing prominence –medial – rising pitch into the accented vowel to high level before fall, increasing-decreasing prominence –late – rising pitch from low level late in the accented vowel to high before fall, late increasing prominence

32 this peak categorization was the beginning of KIM – The Kiel Intonation Model –first presented at Bell Labs, Murray Hill, 1986 "Some rules for intonation synthesis in German" and at the 17th Int Cong Acoust, Toronto 1986 "Computer synthesis of intonation" –then at the 11th ICPhS, Tallinn 1987 "The linguistic functions of f0 peaks" "Categorical pitch perception" and at the 1st LabPhon, Columbus/Ohio 1987 "Macro and micro F0 in the synthesis of intonation" (1990)

33 (3)overlapping with (2), TTS project with Infovox and KTH, Stockholm from 1987 to 1991 –developing the various modules for German –including intonation and prosody –implementation of KIM categories °extension to other pitch categories rising, falling-rising, level; their concatenation °perceptual and functional testing °incorporation in KIM

34 –documentation of stages (2) and (3) °AIPUK 25 (1991) °"Parametric control of prosodic variables in symbolic input in TTS synthesis", van Santen et al. (eds), Progress in Speech Synthesis, 1997 °"The Kiel Intonation Model (KIM), its Implementation in TTS Synthesis and its Application to the Study of Spontaneous Speech" http://www.ipds.uni-kel.de/kjk/forschung/kim.en.html

35 (4)1992-96 VERBMOBIL project –data collection in appointment-making scenario Corpus of Spontaneous Speech of German –segmental and prosodic labelling –development of PROLAB on basis of KIM –which in turn made systematic data retrieval and corpus analyses possible –documented in "Modelling prosody in spontaneous speech", Sagisaka et al, Computing Prosody, 1997

36 (5)From 2000 onwards, further development of KIM, especially through Oliver Niebuhr's research –refinement of phonetic manifestations of peak contours beyond alignment differences –refinement of their functional links –more detailed study of peak concatenation –more detailed investigation of prosody and semantics of valley shifts –prosody of emphasis –he will report on these in the next colloquium

37 3.2Summary of the basic categories of KIM lexical stress –phonological place-marker for a particular syllable in a word, e.g. increase verb and noun. °in languages like German, English, Dutch °but not French, which has no lexical stress –this stress feature is abstract and not definable in physical terms, although attempted since Fry –marks the position in a word where a sentence accent manifests itself in physical parameters if the word is made prominent, i.e. accented

38 sentence accent –different relative degrees of prominence given to words in sentences for differentiation of meaning –KIM distinguishes 4: default – 2, reinforced – 3, partially deaccented – 1, completely deaccented – 0 2Max 0hat 0einen 2Brief 1geschrieben. "Max wrote a letter." answering the question "what did he do?" 2Max 0hat 0einen 2Brief 0geschrieben. answering the question "what did write?" (= a letter, not a postcard) 2Max 0hat 0einen 3Brief 0geschrieben. contrastive focus/emphasis on "letter" 2Max 0hat 0einen 2Brief 3geschrieben. contrastive focus on "wrote", as against "typed"

39 2This 0is 0the 2list 0of 2emails 0he 1received. answering the question "what kind of list is this?" 2This 0is 0the 2list 0of 2emails 0he 0received. answering the question "which list is this?" (= the email list, not the phone call list) 2This 0is 0the 2list 0of 3emails 0he 0received. contrastive focus/emphasis on "emails" 2This 0is 0the 2list 0of 2emails 0he 3received. contrastive focus on "received", as against "sent"

40 pitch categories at each accent position –peak, valley, combined, level contours, e.g. ja; yes –low or high rising valleys, e.g. ja; yes –synchronization of peaks and valleys with vocal tract dynamics °peaks – early, medial, late, e.g. ja; yes °valleys – early, late, e.g. ja; yes –concatenations of pitch contour categories °peaks in hat patterns or dipped, °e.g. ja oder nein; yes or no °in hat pattern first peak not 'early', last not 'late'

41 downstep and upstep of successive peaks or valleys –structural downstep, not declination on time basis –reset for bracketing word sequence into chunks, e.g. at any syntactic or semantic boundary weiße, schwarze, blaue, rote, grüne, gelbe white, black, blue, red, green, yellow –upstep for stronger separation of chunks and greater prominence prosodic boundaries –degrees of cohesion between utterance chunks –signalled by bundles of features f0-pattern, reset/upstep, segmental durations, pause/breath durations, phonation

42 speech rate pitch register emphatic accent categories –force accent –negative and positive intensification –accent d'insistance

43 “It stinks!” negative positive

44 Germ. (Wie Boris) Valerie die Treppe runterkickt. “(When Boris) kicks Valerie down the stairs.” k 

45 French Mais alors vraiment, c’est tordu ici, hein. “But really, that’s crazy here, eh.” 

46 3.3Transfer of peak categories to other languages Same perceptual and functional differentiation of pitch peaks in all West Germanic languages: English –"She's been lying." exactly parallel –formally tested by Kleber °MA dissertation, Kiel 2005 under my supervision ° SP2006 Dresden " Form and Function of Falling Pitch Contours in English"

47 AS = accented syllablePAS = postaccented syllable 111 Hz 114 Hz 140 Hz 96 Hz 76 Hz  ASPAS t F0 She's gone to Malaga.

48 Mandarin Chinese when I reported on the form and function of pitch peak contours at Bell Labs, Murray Hill, there was a young Chinese doctoral student in the audience, Chilin Shih –who had no knowledge of German –but who perceived the same psychophonetic category changes –and equated them with Mandarin tones °early peak – the low tone 3 °medial peak – the falling tone 4 °late peak – a combination of tones 2 + 4

49 3 conclusions deducible from these observations –aspects of pitch perception have a psychophonetic basis independently of the language and its comprehension –Mandarin uses similar pitch categorizations at the lexical level that are systematized at the phrase level in German and English: low vs. high fall –so, how do Chinese speakers realise the pragmatic meanings associated with f0 peaks?

50 It is to be expected that the functions of finality and openness in argumentation are basic in speech communication in any language. –they should therefore also be manifested in adapted form in a tone language like Mandarin –hao with low tone 3 °"OK, I am forced to agree" °"OK, I am happy to agree" –xing with rising tone 2 °the same distinction in argumentation

51 Chinese hao "OK" (upper) und xing "OK" (lower), resigned (left), happy (right); Yi Xu, UCL

52 Chinese hao "OK" (upper) und xing "OK" (lower), resigned (left), happy (right); Aoju Chen, MPI Psycholinguistics, Nijmegen

53 In the 'resigned' as against the 'happy' context –hao °either no rise or a much lower one °intensity lower and descending more quickly °the syllable is shorter –xing in the two contexts is similarly differentiated °lower vs. higher pitch rise °lower vs. higher and faster vs. more slowly descending intensity °shorter vs. longer duration.

54 –both speakers differentiate the two contexts across the different word tones in the same way by lowering vs. raising pitch and prominence, superimposed on the lexical tones –this essentially parallels signalling of the functional categories in non-tonal languages

55 French –'resigned' vs 'happy' has the same low vs high pitch movement in French Oui, monsieur, with early vs medial fall, as in Germanic languages –but this is where the correspondence ends °French has no lexical stress, so no fixed place holders for sentence accents °French uses accents very sparingly for neutral propositional highlighting; accent d'insistence insists, contrasts and reinforces > emphasis °primary role of pitch is in syntagmatic phrasing ° and, of course, in emphasis and attitudes where the same pragmatic signalling occurs

56 other Romance languages –lexical stress, accent and pitch categories, including peak synchronization, can be more easily transferred –the coding of narrow focus in polarity question vs statement in Neapolitan Italian can easily be captured and explained with reference to peak shape °S-shape rise vs. S-shape fall in question/statement in focus position, with higher/lower post-focus °greater intensity, synchronization may also be later °intensifies high pitch in accented syllables °not as a local H but as a global perceptual Gestalt

57 speech melodies have been found to directly link physical signals to meaning in contexts of situation –a high end point of a rising contour can code a question irrespective of the syntactic form –a rising melody, as against a falling one, signals concern for a listener –synchronizations of peak contours are directly bound to pragmatic contexts 4 Explaining the data in the framework of KIM

58 in all languages real polarity questions are associated with high tones –high register –rising –increasing the high level of peak contours °in Neapolitan questions °it even occurs in an elliptic question word question in the English title of this talk why? –high tones > uncertainty, submisson low tones > confidence, dominance

59 –Charles Darwin gave an answer The expression of the emotions in man and animals 1872 °dog makes body big or small, to signal strength or weakness °it growls/whimpers with big/small vocal folds

60 –John Ohala frequency code: Phonetica 40 (1983) “Cross-language use of pitch: an ethological view” °inverse meaning association of pitch and body size °physically given, not arbitrary, iconic °via body size link with dominance and submission

61 –when we ask a polarity question, we request a decision from our dialogue partners, we submit to them because we want something from them –thus the general occurrence of high or rising pitch for polarity questions in the languages of the world may be explained with reference to Ohala's frequency code °especially in the absence of formal devices, e.g. in echo questions with declarative or elliptic syntax in German, or generally in questions in Neapolitan Italian °or for the expression of listener orientation

62 But Ohala's version of the frequency code seems to be only part of the story –its reference to the functional dimension of dominance/submission should be supplemented by a further functional dimension of stimulation –valley contours in polarity questions can then be seen as signalling an additional aspect in speech interaction stimulating the addressee into action

63 mothers who want to stimulate their babies use –modal, higher-pitched voice, lip spreading, faster speech rate, raised loudness –as an expression of activating motherese –perhaps combined with tickling –ah, wie das gut tut, da freut er sich, mein Goldschatz ah, isn't this nice, you like this, my little darling

64 mothers who want to comfort their babies use –breathy, lower-pitched voice, lip rounding, slower speech rate, reduced loudness –as an expression of calming motherese –oh, mein süßer Kleiner, Mamis Liebling oh, my little sweetheart, mummy's darling

65 this contrast of activating – calming is also found in nursery rhymes –knee riders °Hoppe hoppe Reiter –as against lullabies °Schlaf, Kindlein, schlaf °Guten Abend, gut' Nacht set to music by Brahms and in poems contemplating on the evening quiet °Der Mond ist aufgegangen by Matthias Claudius °Goethe's Über allen Gipfeln ist Ruh

66 Ohala's f0 code is referable to a physical size code in the struggle for survival in the evolution of the species –enters speech interaction in the expression of emotions such as anger vs. happiness –and of attitudes towards the addressee, such as dominance vs. submission, and this may include statements/commands vs. polarity questions

67 the activation dimension of the f0 code is different –controlling others by stimulating or soothing their behaviour –here high and low pitch have just the opposite effect from the dominance dimension this difference is effected by combining pitch with different types of voice quality and other prosodic variables in the two dimensions

68 functional dimension activating – deactivating in the f0 code is also apparent in the coding of openness vs. finality –by peak contour synchronization in German or English –where finality may become associated with resignation or boredom, as in rote data collection mentioned earlier –what is important is not the absolute height but the change low-high vs high-low in the CV transition of the accented syllable in medial vs early peak –various acoustic parameters converge to create this contrast –synchronization is only one, probably most important –Oliver Niebuhr will say more about this

69 5Conclusion and outlook I have introduced you to KIM with relevant audio examples to illustrate its various categories and signal – function relations as part of a paradigm of speech communication. My presentation has also reflected on two decades of empirical investigation, at IPDS Kiel, into the production and perception of speech interaction, in which prosody has a central role. And I have stressed scientific prerequisites to any prosodic modelling.

70 Our research in Kiel has combined corpus analysis with experimental data acquisition in a cyclic spiral progression between theory and data towards the ultimate goal of finding answers to the question how do humans communicate by speech in the world's languages? At this point the question arises how these communicative functions and their phonetic realizations in human speech are distributed across the languages of the world.

71 The functions are so elementary for human interaction that we can assume that they find expression in all languages –the phonetic substantiation will vary –so we need to find out how it varies –by studying a large variety of languages from a perspective that relates linguistic form to communicative function –and thus reverses the procedure that 125 years of western linguistics have taught us

72 The function – form framework of KIM offers a scientific paradigm –not in the way ToBI categories, worked out on English, have been inculcated as God's truth on the languages of the world –but as a general theoretical and methodological approach to speech communication –within which the prosodic systems of individual languages can be modelled in their own right.

73 The functional perspective provides a basis for language comparison, typology and universals. To get there we need to practice our discipline as a communicative phonetic science with a sociology of science that does not insist on rallying the speech community worldwide under one banner that happens to be fashionable at a particular time.

74 Voilà un autre modèle à discuter! Je vous remercie de votre attention!


Download ppt "How to Model Prosody? The case of KIM – The Kiel Intonation Model Klaus J. Kohler IPDS, Kiel, Germany Paper given at Laboratoire Parole et Langage, Aix-en-Provence."

Similar presentations


Ads by Google