Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik.

Similar presentations


Presentation on theme: "Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik."— Presentation transcript:

1

2 Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik

3 Annotation of speech 1 Manipulating text vs. speech [1] text file manipulation "vowel-only" version remove all consonant letters, replace them with a space, so that only the vowels are left e ea e o e a o o o o : a e ou y i e o i i a e u y e i e a e oo.

4 Annotation of speech 2 Manipulating text vs. speech [2] text file manipulation "consonants-only" version remove all vowel letters, replace them with a space, so that only the consonants are left Th w th r f r c st f r t m rr w: r th r cl d n th m n ng w th f w s nn sp lls n th ft n n.

5 Annotation of speech 3 Manipulating text vs. speech [3] The weather forecast for tomorrow: rather cloudy in the morning with a few sunny spells in the afternoon. speech file manipulation original recording, not manipulated "consonants-only" version: vowel segments replaced with silence "vowels-only" version: consonant segments replaced with silence

6 Annotation of speech 4 Coarticulation articulating means articulator in motion, not in fixed position articulators move continously, not discretely articulatory movements temporally overlap

7 Annotation of speech 5 original vowels only vowels only without silences

8 Annotation of speech 6 Timing information of consonant durations: silence is more than nothing

9 Annotation of speech 7 Speech melody information about fundamental frequency (F0) in the voiced vowel segments with F0 variation without any F0 variation (monotonous)

10 Annotation of speech 8 Annotation of sound segments: discreteness in mind & in physics "Es ist 8 Uhr morgens." mmmmmm oOoO N ssssss graphemes phonemes phones O6 rrrr gggg e@e@ nnnn

11 Annotation of speech 9 Annotation of sound segments: discrete units? "Die Nacht haben Maiers gut geschlafen." "…………… haben Maier ……………………." phonemich a: b @ n m aI @ r s acoustic-phonetich a: b m aI 6 s articulatory phonetic h a: b n m aI 6 s (possibly)

12 Annotation of speech 10 Segmentation of sound segments: degree of discreteness "Wer möchte noch Milch?" clear segmentation: closure and closure release in [t] in "möch t e" unclear segmentation: [I l] in "M il ch"

13 Annotation of speech 11 Kiel Corpus read & spontaneous speech orthography phonemic (canonical) form realised form word & sentence boundary manually labelled

14 Annotation of speech 12 From sounds to syllables: how many syllables? semi-vowels: syllabic or not? StudieStu - di - e vs. Stu - die PianoPi - a - no vs. Pia - no size of auditory window "… mit mir diese Dienstreise zu unternehmen, …" rei - se - zu - un - ter zu - un - ter zu - un

15 Annotation of speech 13 From sounds to syllables: where is the syllable boundary? ambisyllabic consonants & onset principles Mitte/m I - t @/ vs. /m I _t @/ Adler /a: t - l @ r/ vs. / a: - d l @ r/ Fenster /f E n s - t E r/ vs. /f E n - s t E r/ resyllabification "Wenn es Ihnen da 5 Tage lang irgendwo passen würde." /v E n - E s/ vs. [v E _ n E s]

16 Annotation of speech 14 Controlled elicitation of spontaneous speech Monologues Erzählung Bildbeschreibung Dialogues: Task-oriented data collection Map Task Appointment-making Degree of naturalness? Controlled elicitation

17 Annotation of speech 15 Controlled elicitation of spontaneous speech

18 Annotation of speech 16 Problems for annotation: non-speech in speech Many non-linguistic signal portions: swallowing lip-smacking breathing unfilled, filled pauses laughter hesitational lengthening Partly overlapping with speech

19 Annotation of speech 17 Functions of prosody Generally: Features above the segmental level suprasegmental

20 Annotation of speech 18 Phonetic encoding of prosody perceived pitch over time duration intensity spectral quality

21 Annotation of speech 19 Prosodic annotation: Signal oriented Tilt-model (Taylor 2000) intonational events continuous parameters (tilt parameter): amplitude: sum of the magnitude of rise and fall duration: sum of rise and fall durations tilt: shape of the event 1.00.50

22 Annotation of speech 20 Prosodic annotation: Autosegmental, phonological GToBI (Grice et al.) Tonal tier, break tier Two levels of pitch-heights (L, H) Simple and complex pitch accents Association to word stress marked by * Exact temporal alignment Boundary tones marked by % Strength of prosodic breaks (3, 4)

23 Annotation of speech 21 Prosodic annotation: Example tonal orth. break misc

24 Annotation of speech 22 GToBI Labelfiles 46.836392 113 also 46.958899 113 ich 47.171623 113 bin 47.555335 113 genau 48.180049 113 waagerecht 48.468170 113 rechts 48.613576 113 von 48.726670 113 der 49.246344 113 Goldmine 47.469173 115 L+H* 47.555339 115 H- 47.768061 115 H* 47.851534 115 < 48.320061 115 !H* 48.812822 115 !H* 49.240958 115 L-% orthografictones 47.555339 123 3 49.249036 123 4 breaks

25 Annotation of speech 23 Prosodic annotation: Phonological, single-layer KIM (Kohler 1995) no suprasegmental tiers => efficient analysis of segment-prosody interaction differentiated from segmental labels by special diacritica time marks for prosodic events anchored to word boundaries. Example:

26 Annotation of speech 24 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

27 Annotation of speech 25 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

28 Annotation of speech 26 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

29 Annotation of speech 27 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

30 Annotation of speech 28 13 #c: 0.0007500 13 #&2 0.0007500 13 ##v: 0.0007500 13 $Q- 0.0007500 13 $E: 0.0007500 2147 $m 0.1341250 4787 #&PGn 0.2991250 4787 #&2( 0.2991250 4787 ##d 0.2991250 6243 $-h 0.3901250 6619 $'i: 0.4136250 7569 $n 0.4730000 8265 $s 0.5165000 9202 $t 0.5750625 9527 $-h 0.5953750 9995 $a: 0.6246250 10648 $k-x 0.6654375 11405 #&0 0.7127500 11405 ##v 0.7127500 12528 $Y6 0.7829375 13946 $d 0.8715625 14275 $@+ 0.8921250 14721 #&0 0.9200000 14721 ##m 0.9200000 16051 $i:6+ 1.0031250 16935 #&0 1.0583750 16935 ##g 1.0583750 18093 $-h 1.1307500 18564 $'u: 1.1601875 19314 $t 1.2070625 19981 $-h 1.2487500 20336 #&0. 1.2709375 20336 #&2) 1.2709375 20336 ##p 1.2709375 21501 $-h 1.3437500 22440 $'a 1.4024375 23700 $s 1.4811875 25408 $@- 1.5879375 25408 $n 1.5879375 28935 #, 1.8083750

31 Annotation of speech 29 Data structures and retrieval Mostly pure textfiles, aligned to signal Retrieval using script languages (GToBI in EMU-Format) XML-formats

32 Annotation of speech 30 What for? Basic research Rhythmic patterns Speech rate measurements (units, domains) Temporal alignment & scaling of pitch accents Differentiated analysis of pitch range Speech technology Modelling accentuation in ASR Speech rate in ASR Intonation and timing for synthesis

33 Annotation of speech 31 Bibliography Alwan, A., H.Bourlard and S.Furui (eds). 2001. Speech Communication 33. Special Issue on Speech Annotation and Corpus Tools. Grice,M., S.Baumann and R.Benzmüller (to appear). German ToBI. In: S.Jun (ed). Prosodic Typology Grice, M. et al. (2000). Representation and annotation of dialogue. In: Handbook of Multimodal and Spoken Dialogue Systems. Resources, Terminology and Product Evaluation. Kluwer, pp. 1-101. Kohler, K.J. (ed) 1995. Kieler Arbeitsberichte 29. Taylor, P. 2000. Analysis and Synthesis of Intonation Using the Tilt Model. In: JASA 107(3). pp. 1697-1714.


Download ppt "Annotation of speech from the phonetics/phonology perspective Bettina Braun & Jürgen Trouvain 15.02.2002 Fachrichtung 4.7, Institut für Phonetik."

Similar presentations


Ads by Google