Effect and artifact in the perception of stress; a cross-linguistic view Vincent J. van Heuven.

1 Effect and artifact in the perception of stress; a cross-linguistic view Vincent J. van Heuven

2 Introduction, terminology

3 30 April 2008 Stress UAB3 Introduction: terms  Stress Abstract linguistic property of a word Position of strongest syllable in word Only one head: culminative property  Accent Phonetic realisation of a stressed syllable

4 30 April 2008 Stress UAB4 Introduction: terms  Typically, inventory of stressed syllables is larger than that of unstressed syllables  Identity of word is mainly determined by make-up of stressed syllable  Listeners pay more attention when they expect a stress  Word recognition waits for stressed syll.

5 30 April 2008 Stress UAB5 Introduction: terms  Stress is realised by More careful (‘clear’, ‘hyper’) articulation More expanded vowel space Longer duration More intensity (decibels) Flatter spectral tilt (faster adduction) Resistance to assimilation and coarticulation

6 30 April 2008 Stress UAB6 Introduction: terms  When word is important in sentence Stress is additionally signalled by conspicuous pitch movement Movement is associated (‘aligned’) with the stressed syllable  Sentence stress is sometimes called ‘pitch accent’ [not to be confused with Tokyo Japanese]

7 30 April 2008 Stress UAB7 Production ~ Perception  Perception Sentence stress is more prominent than just word stress A well-aligned pitch movement is always heard as a stress: strongest cue by far But is not always present Absent when word has no sentence stress Therefore: pitch is strong but inconsistent cue

8 30 April 2008 Stress UAB8 Production ~ Perception  Production Most consistent cue is relative duration of rhyme portion in syllable Ratio between stressed and unstressed version of rhyme (in paradigmatic comparison) is the same, whether pitch movement is present or not

9 30 April 2008 Stress UAB9 Aside  Paradigmatic ~ syntagmatic comparison (the) IMport ~ (to) imPORT Do not compare first syll with second syll You will find that unstressed port is longer and louder (dB) than stressed IM Compare stressed IM with unstressed im, and stressed PORT with unstressed port

10 Functional load hypothesis

11 30 April 2008 Stress UAB11 Functional load hypothesis  Classical order of importance of stress cues (Fry 1955, 1958, 1965) Pitch (movement) Duration Intensity Spectral expansion  Based mainly on English stress

12 30 April 2008 Stress UAB12 Functional load hypothesis  Berinstein (1979) You can spend your money only once If language uses a parameter for segmental contrast, it cannot use the same parameter as a stress cue E.g., if a language has long ~ short vowels, duration is no longer an effective stress cue

13 30 April 2008 Stress UAB13 Berinstein (1979)  Languages contrasted:

14 30 April 2008 Stress UAB14 Functional load hypothesis  Berinstein (1979) English has tense (long) and lax (short) vowels Spanish has neither tenseness nor length as a parameter Prediction: duration is less effective stress cue in English than in Spanish

15 30 April 2008 Stress UAB15 Functional load hypothesis  Berinstein (1979) K’ekchi, fixed final stress, with vowel length contrast Cakchiquel, fixed final stress, no length contrast Prediction: duration is less effective stress cue in K’ekchi than in Cakchiquel

16 30 April 2008 Stress UAB16 Berinstein (1979)  Perception study  Stimuli /bibibibi/, 100 ms base vowel duration (+ 40 ms for /b/) test vowel has deviant duration 70, 100 (control), 120, 140, 160, 200 ms  Listeners 36 native English (mean age 22) 22 monolingual Spanish (mean age 23) 31 K’ekchi (mostly bi-lingual, mean age < 20) 46 Cakchiquel (all bi-lingual, mean age < 20)

17 clear position bias: more stress judgments as test syllable occurs earlier in the word 86, 67, 62, 46% huge effect of duration (lengthening > 50% attracts stress) 34, 44, 89, 94% overall effect better than 2x chance

18 no position bias 34, 34, 32, 32% small effect of duration: 28, 26, 39, 39% overall effects just above chance

19 clear position bias: more stress judgments as test syllable occurs later in the word: 19, 23, 31, 44% no clear effect of duration manipulations 28, 26, 30, 34% overall effect hardly above chance

20 30 April 2008 Stress UAB20 Berinstein (1979)  Results of perception test (cont.)  English clear position bias: more stress judgments as test syllable occurs earlier in the word huge effect of duration (lengthening > 50% attracts stress) overall effect better than 2x chance  Note I replicated the experiment with Dutch listeners results identical to English

21 30 April 2008 Stress UAB21 Berinstein (1979)  Results of perception test  K’ekchi clear position bias: more stress judgments as test syllable occurs later in the word no clear effect of duration manipulations overall effect hardly above chance  Spanish no position bias small effect of duration overall effects just above chance

22 tiny effect of duration: only 200-ms vowels attract some stress judgments 24, 29, 25, 35%


24 30 April 2008 Stress UAB24 Berinstein (1979)  Summary of observations re. duration Large effect in English But English also has length contrast Small effect in Spanish Even though Spanish has no length contrast Small effect in Kekchi Even though Kekchi has vowel length contrast Same small effect in Cakchiquel Even though Cakchiquel has no length contrast

25 30 April 2008 Stress UAB25 Berinstein (1979)  Conclusion re. Berinstein (1979) Results simply contradict all predictions Within the European languages Spanish should use duration more than English (but does not) Within the Mayan languages Cakchiquel should use duration more than K’ekchi (but does not) Therefore little credibility for functional load hypothesis

26 30 April 2008 Stress UAB26 Berinstein (1979)  Extra: position bias in Berinstein Strong initial-stress bias in English OK, most words have initial stress Weak final-stress bias in K’ekchi OK, but why weak? Weak prefinal stress bias in Cakchiquel Not predicted No stress bias at all in Spanish Why? What is the distribution of stress in Spanish?

27 30 April 2008 Stress UAB27 Functional load hypothesis  Posituk, Gandour & Harper (1996) Thai has five lexical tones Prediction: pitch cannot be an effective stress cue Thai contrasts long short vowels Prediction: duration cannot be an effective stress cue  Acoustic correlates were measured i.e. NOT a perception study

28 30 April 2008 Stress UAB28 Potisuk et al. (1996)  Method two male, three female speakers (read-out speech) 25 sentences with minimal stress pairs (20 with long vowels, 5 with short vowels) full 5 x 5 matrix of two-tone sequences

29 30 April 2008 Stress UAB29 Potisuk et al. (1996)  Note: stress pairs are not really minimal one is a two-word sequence (N-V) the other is a two-syllable compound  Measurements only initial syllables were measured (paradigmatic) F0 curve, in ERB + Z-transform, time-normalised (reduction to mean and SD) Rhyme duration (re. sentence duration, within- speaker normalisation for inherent segment duration) Intensity curve (normalised within speakers, reduction to mean and SD (through Z-transform)

30 five-member lexical tone contrast is fully maintained in [–stress], even though F 0 curves are flattened considerably

31 Mean F0: No difference between +stress and –stress

32 F0 variability: larger for +stress, stronger for some tones than for others (interaction of stress and tone)

33 Mean intensity: no difference

34 Intensity variability: no difference

35 Duration: [+stress] much longer than [–stress], for all lexical tones (i.e. no stress x tone interaction)

36 30 April 2008 Stress UAB36 Potisuk et al. (1996)  Results Mean F 0 : no difference F 0 variability: larger for [+stress], stronger for some tones than for others (interaction of stress and tone) Mean intensity: no difference Intensity variability: no difference Duration: [+stress] longer than [–stress], for all lexical tones (i.e. no stress x tone interaction )

37 30 April 2008 Stress UAB37 Potisuk et al. (1996)  Acid test: automatic classification by LDA rhyme duration >> F 0 -SD >> Intensity SD 99% correct classification with duration alone  Interesting point five-member lexical tone contrast is fully maintained in [–stress], even though F 0 curves are flattened considerably In other languages lexical-tone contrasts may be neutralised in [–stress] conditions

38 30 April 2008 Stress UAB38 Potisuk et al. (1996)  Conclusions  Results largely go against functional load hypothesis Duration is by far the strongest correlate (but should not be) F 0 should not be a correlate and indeed is not in terms of mean F 0 But is a good stress cue in terms of F 0 range

39 30 April 2008 Stress UAB39 Multiple sources of variability  Vowel duration is longer (e.g. Klatt, 1974) in [+long] vowels before deeper prosodic breaks in syllables with word stress in words with sentence stress in slow speech before voiced (and esp.) sonorant consonants

40 30 April 2008 Stress UAB40 Multiple sources of variability  Listeners are able to decompose different sources of variability in a parameter E.g. Nooteboom (1979) shows that Dutch listeners use duration effectively to make multiple simultaneous contrasts Long ~ short vowels Depth of prosodic break They adjust the long ~ short boundary depending on the depth of the break

41 30 April 2008 Stress UAB41 Functional load hypothesis  Since simultaneous effects are perceptually decomposed, the functional load hypothesis seems too simple Results indicate that we can both have our cake and eat it ‘Get two for the price of one’  Original hierarchy still stands

42 Duration as a stress cue in English

43 30 April 2008 Stress UAB43 Postnuclear stress contrast?  Beckman & Edwards (1994)  Simple prominence hierarchy in English  Four degrees of prominence Full vowel > reduction vowel (schwa) Pitch movement > no pitch accent Last accent > earlier accents

44 30 April 2008 Stress UAB44 Postnuclear stress contrast?  Beckman & Edwards (1994)  Predictions Schwa cannot be stressed unless it is transformed to a full vowel first No contrast between initial and final stress in postnuclear words with full vowels (Scott 1939, Huss 1978).

45 30 April 2008 Stress UAB45 Postnuclear stress contrast?  Scott (1939) One sentence, initial stress only Noun ~ verb minimal stress pair 11 listeners, forced choice Response distribution towards initial stress But not significantly so



48 30 April 2008 Stress UAB48 Postnuclear stress contrast?  Pilch (1970) Difference between import ~ import is exclusively a matter of intonation Not carried by stress If intonation cues are removed (by embedding target in postnuclear position) no difference between noun and verb reading should remain

49 30 April 2008 Stress UAB49 Postnuclear stress contrast?  Huss (1978)  Used same clever sentences as Scott Actually, even cleverer  Identical word sequences with different stress pattern on noun~verb pairs in postnuclear position  See examples

50 (1) It is not true that all nations have always been equally self-sufficient as far as the production of sinks is concerned. The degree of self-sufficiency has changed during the last year: Whereas formerly the Americans used to import sinks, now the Germans import sinks. Did you say the Germans import sinks? (2) It is not true that the balance of payment of all nations has always been equally healthy. The amount of net import has changed in different ways for different nations: Whereas formerly the Americans’ import used to sink, now the Germans’ import sinks. Did you say the Germans’ import sinks?

51 30 April 2008 Stress UAB51 Huss (1978)  Method 4 different noun~verb pairs Nuclear~postnuclear target position Statement~question 7 speakers 3 phonetic expert listeners 4 x 7 x 3 = 84 stress judgments per condition

52 Lexical stress pattern Perceived as Noun (initial)Verb (final) statementNoun2575 Verb2476 QuestionNoun2476 Verb1486 No effect Trend, χ 2 = 1.89 (p = 0.167) Huss (1978) perception test: Percent responses

53  (1) [the GERmans] [import sinks]  (2) [the GERmans’ import] [sinks] Final lengthening of unstressed syll. No lengthening of stressed syll.

54 30 April 2008 Stress UAB54 Huss (1978)  No clear difference between initial and final stress in postnuclear minimal pairs with full vowels only  As predicted by Beckman & Edwards  But stress and phrasing confounded  Let us keep phrasing constant and vary stress only. See Huss (1975)

55 30 April 2008 Stress UAB55 Huss (1975)  Method 10 minimal stress noun~verb pairs We FIRST import, he said [Verb, final stress] His FIRST import, he said [Noun, initial stress] 2 male speakers Informal listening procedure Unknown number of listeners (but phonetically trained)

56 30 April 2008 Stress UAB56 Huss (1975)  Perceptual results One group of words with stress perceived in conformity with noun~verb contrast, high agreement among listeners In ‘a few words’ listeners did not agree In ‘some other words’ listeners did agree but reported stress the wrong way around  Unfortunately no quantitative data

57 The decisive auditory parameter in the identification of stress in post- nuclear position, i.e. in the absence of a pitch contrast, was the duration ratio between the two syllables; the experimental follow-up study should bear out which acoustic parameters correlate with this auditory impression.

58 30 April 2008 Stress UAB58 Huss (1975)  Perceptual results In pairwise comparison of noun~verb pairs vowel duration seemed the clearest correlate  Acoustic measurements of one speaker presented (better speaker)  Second speaker had more perceptual ambiguities (and reversals) No quantitative data

59 Duration ratio S 1 / S 2 Nouns, initial stress Verbs, final stress Duration contrast even more extreme in postnuclear than nuclear stress

60 30 April 2008 Stress UAB60 Huss (1975)  Conclusion At least some speakers produce a very reliable contrast between initial and final stress in postnuclear position in words with full vowels only The correlate is syllable duration The contrast, when made, is adequately perceived

61 30 April 2008 Stress UAB61 Postnuclear stress contrast?  Beckman & Edwards seem wrong English speakers tend to preserve stress contrast in postnuclear position English listeners are sensitive to the contrast even when there is no pitch movement (duration is effective cue) Same effects were found for Dutch Nooteboom (1972), van Katwijk (1974), Sluijter & van Heuven (1996)

62 30 April 2008 Stress UAB62 Sluijter & van Heuven (1996)  Prenuclear (unaccented) targets  Lexical pair ‘canon~cannon’  Reiterant mimicry


64 30 April 2008 Stress UAB64 Sluijter & van Heuven (1996)  Results Duration (ratio S1/S2) very strong stress cue Equally effective in nuclear and non-nuclear position Affords 100% stress decisions in LDA Linear Discriminant Analysis Automatic classification algorithm


66 30 April 2008 Stress UAB66 Sluijter et al. (1997)  Duration, intensity and loudness as perceptual cues in stress perception in non-nuclear position  Overall result: Duration is strongest cue Loudness (intensity > 500 Hz) is second Intensity is weak cue

67 30 April 2008 Stress UAB67 Aside: strength of cues  Standard plots % stress as a function of X but averaged over all Y steps Y but averaged over all X steps Observe difference in psychometric function Obscures interaction between X and Y  Alternative: quasi 3D plots


69  Plot quasi 3-D  Determine cross-overs (50%) in X and Y dimensions, by e.g. Linear interpolation Probit fitting  Compute linear regression line through points  Determine slope of function 90 0 : X only cue 0 0 : Y only cue 45 0 : equal strength


71 30 April 2008 Stress UAB71 Last minute results  Dutch minimal stress pair ‘I have yesterday a canon/cannon heard’  Prenuclear ik heb gisteren een kanon GEHOORD  postnuclear ik heb GISTEREN een kanon gehoord

72 30 April 2008 Stress UAB72 Last minute results  Starting from each natural base stimulus  7 manipulations of syllable duration ratio (using Praat PSOLA)  4 repetitions of each type  20 native Dutch listeners  80 responses per data point


74 30 April 2008 Stress UAB74 Last minute results  Duration ration is very effective stress cue in Dutch  Also (smaller) effect of base stimulus  Same effects before and after nuclear accent  Same effects are expected for English

75 30 April 2008 Stress UAB75 Summing up  Duration is very effective stress cue in Dutch, even in non-nuclear position  It should also be so in English  Work in progress at Leiden University Production and perception of stress in pre- and postnuclear position in Dutch and English. No results for English at this stage.

76 Stress bias

77 30 April 2008 Stress UAB77 Van Heuven & Menert (1996)  Strange difference Strong initial bias for English (but no fixed initial stress) Weaker final bias for K’ekchi (although exceptionless fixed stress)  Why the difference?  Bias is partly the result of artifact

78 30 April 2008 Stress UAB78 Van Heuven & Menert (1996)  Experiment 1  Synthesized Dutch minimal stress pairs Monotone100 Hz flat Declination Hz Inclination Hz Noise source (i.e. no periodicity, whisper)  Manipulated duration ratio S1 / S2

79 30 April 2008 Stress UAB79 Van Heuven & Menert (1996)  Experiment 1: Results Large effects of duration manipulation Strong overall bias for initial stress Reduction of initial-stress bias: Declination(85%) > Monotone(80%) > Inclination(60%) > Noise(55%).



82 30 April 2008 Stress UAB82 Van Heuven & Menert (1996)  Experiment 2: Effect of context  Same stimuli & manipulations as before  Also preceded by short carrier, so that first syllable of target does not appear out of the blue

83 30 April 2008 Stress UAB83 Van Heuven & Menert (1996)  Experiment 2: Results  Isolated targets: Replicates exp 1.  Preceding context: Bias for initial stress completely gone


85 30 April 2008 Stress UAB85 Van Heuven & Menert (1996)  Apparently: bias is not inherent but induced by Presence/absence of a preceding context Whether (first syllable of) target has pitch  Suggestion: Bias is induced by virtual pitch jump from assumed/inferred F 0 baseline

86 30 April 2008 Stress UAB86 Van Heuven & Menert (1996)  Inferred baseline is speaker’s bottom pitch (roughly 70 Hz)  Prediction The higher the level pitch of an isolated target, the larger the virtual F 0 jump, the stronger the initial stress bias No bias when target has 70 Hz pitch

87 30 April 2008 Stress UAB87 Van Heuven & Menert (1996)  Experiment 3 Same reiterant stimuli Synthesized at 70, 100, 130 and 160 Hz We also manipulated formant settings +20%, –15%, 0% (neutral)  If virtual pitch jump, then initial stress bias should increase with onset F0

88  Some initial-stress bias is stimulus induced  Inferred virtual pitch from speaker’s baseline seems justified  Other effects may also play a role Listeners expect final lengthening in isolated words Through perceptual compensation last syllable in an equal duration string of four sounds less stressed  Results help to explain why initial stress bias is strong in English and final bias is weaker in Mayan languages K’ekchi and Cakchiquel

89 Vowel reduction as a stress cue

90 30 April 2008 Stress UAB90 FRY (1965): DURATION vs. SPECTRAL REDUCTION  4 Minimal stress pairs (noun vs. verb) CONtract ~ conTRACT SUBject ~ subJECT Digest ~ diGEST Object ~ obJECT  3 duration steps (smaller range than in Fry 1955, 1958)

91 30 April 2008 Stress UAB91 DURATION vs. SPECTRAL REDUCTION  3 degrees of vowel reduction/expansion for V1 while keeping V2 constant (mid value): f1, f2, f3 for V2 while keeping V1 constant (f4, f5, f6) Note: reduction of diphthong /ai/ by reduction of glide trajectory (full, halfway, none= endpoint only)


93 duration manipulation quality manipulation V1V2

94 30 April 2008 Stress UAB94 DURATION vs. SPECTRAL REDUCTION  Intensity (V1=V2) and F0 (120 Hz) were kept constant  Problem? There is a constant 6dB difference between F1 and F2, i.e., spectral tilt depends on frequency difference between F1 and F2: the larger the distance the flatter the tilt

95 30 April 2008 Stress UAB95 DURATION vs. SPECTRAL REDUCTION  RESULTS Effects of duration structure (in spite of restricted duration range) stronger than of spectral reduction Effects of reduction of V1 stronger than of V2

96 30 April 2008 Stress UAB96 Van Bergem (1993)  Spectral reduction in Dutch  Production study Measurement of F1 and F2 at most stable portion during vowel (least spectral change) Systematic manipulation of stress, focus, and lexical status of words  Manipulation of focus through question/answer pairs:

97 Test syllable: can (What did you buy for your mother? I bought [CANdy] +F for my mother+C +A +S (For whom did you buy candy?) I bought [CANdy] -F for my mother+C -A +S (Where do they sell beer?) In our [canTEEN] +F they sell beer+C +A -S (What do they sell in our canteen?) In our [canTEEN] -F they sell beer+C -A -S (What can your sister do for hours?) My sister can [TALK] +F for hours-C +A (How long can your sister talk?) My sister can [TALK] -F for hours-C -A [CAN] +F (spoken in isolation) ISO

98 30 April 2008 Stress UAB98 Van Bergem (1993)  Experimental set-up 15(male) speakers 7stress/accent/status conditions 33test syllables  yielding 3465 vowel tokens

99 30 April 2008 Stress UAB99 Van Bergem (1993)  Selected results For test syllables with /e:/, /o:/ and /a:/ only No function words Spectrally most expanded tokens for isolated words marginal reduction for +A+S Appreciable reduction for -A Appreciable reduction for -S Effects of A and S are equal and additive

100 30 April 2008 Stress UAB100

101 30 April 2008 Stress UAB101 Van Bergem (1993)  Notes These are acoustic effects Proper studies of the cue value of spectral reduction for stress/accent perception have to be carried out yet (for any language whatsoever) …preferably in relationship with cues to domain-final lenthening

102 Unified view

103 30 April 2008 Stress UAB103 Unified view  There is no unified view  I would like to assume that all languages use stress parameters in the same way Not necessarily in speech production but certainly in speech perception Although the use of pitch for the marking of sentence stress may differ

104 30 April 2008 Stress UAB104 Unified view  No room for a functional load hypothesis  Unclear why duration is such a weak cue for Spanish in Berinstein (1979)  But strong cue in Catalan in recent work at UAB Also in Spanish?  (Much) more research needed

105 Thanks for bearing with me

