Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg.

Similar presentations


Presentation on theme: "From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg."— Presentation transcript:

1

2 From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg

3 Linguistic sounds How does a sound wave become language? Sounds are continuous wave forms. Linguistic units are categorical. How is the human perceptual system able to categorize and combine linguistic sounds into language? 1

4 Studying Speech Who studies speech? –Linguists (phoneticians, phonologists, forensic linguists) –Speech Engineers Speech recognition Speech synthesis etc. –Speech Pathologists –Language Instructors –Singers –Marketing experts 2

5 Marketing experts? 3

6 Studying speech Major questions in studying speech. –What is the sound inventory of a language? Which variations are linguistically relevant? –R/L in Asian Languages –P/P h in English –How are speech sounds produced? –What sounds are shared by two languages, and which are not? –How do sounds vary in context? “Green banana” vs. “Greem banana” 4

7 Representing speech sounds Why are representations important? –translation between sounds and words ASR and TTS –Learning pronunciation –Having a shared vocabulary to discuss language. How should we represent speech sounds? –Orthography? –Special symbols? –Abstract classes based on sound and/or articulatory similarities 5

8 Using orthography to represent sounsd A single orthographic letter is realized in many different ways (in English) –bcomb, tomb, bomb –ccourt, center, chess –oofood, good, blood –sreason, sunrise, shy, collision 6

9 Using orthography to represent sounsd A single sound can be written in many different ways (in English) –[i]sea, see, scene, receive, thief, miss –[s]cereal, same, miss –[u]true, few, choose, lieu, do –[ay]lie, prime, pry, buy, How is orthography looking as a choice in English? 7

10 Phonetic Symbol Sets International Phonetic Alphabet (IPA) –Single (unique) character for each sound –Represents all sounds of the world’s languages, but is large, and requires a special (non-ascii) font. ARPAbet, TIMIT, etc. –Multiple characters for each sound –Language specific. A new symbol set is required for each language. 8

11 9 Exercise: Write your full name in English orthography and in ARPAbet.

12 Sound categories Phone: Basic speech sound of a language –A minimal sound difference between two words too vs. zoo –Not every sound made by a human speaker is phonetic Sniffs, laughs, coughs, breaths… Phoneme: Class of speech sounds –Phoneme may include several phones –/t/ in top, stop, little, butter, winter Allophone: the set of phonetic variants that comprise a phoneme. –{[t], [ ɾ ], …} 10

13 Speech Production The articulatory organs General Process: –Air is expelled from the lungs through the windpipe (trachea) leaving via the mouth (and nose) –Air passes through the trachea through the larynx which contains the vocal folds – the space between them is the glottis. –When vocal folds vibrate, voiced sounds are produced, otherwise, voiceless (e.g. [f] vs [v]) 11

14 Vocal Fold Vibration 12 Slow motion video of normal vocal folds

15 Articulators “Why did Ken set the net on the soggy deck?” Queens University ATR Labs X-ray Film Database http://psyc.queensu.ca/~munhallk/05_database.htm 13

16 Vocal Organs 14

17 Recording Articulatory Data X-Ray Microbeam Database –Track motion of small gold pellets on the tongue, jaw, lips and soft pallate Electroglottography –Run a high freq current through the glottal area of a speaker. –There is lower resistance when the vocal folds are closed. Electromagnetic articulography (EMMA) –3 transmitters on a helmet allow for triangulation of 5-15 sensor positions 15

18 Classes of Sounds Consonants and Vowels –Consonants: Restricted or blocked airflow (e.g. [s]) Voiced or unvoiced –Vowels Unrestricted airflow voiced –Semi vowels (approximants): [w], [y] 16

19 Consonants: Place of Articulation What is the point of maximum air restriction? –Labial: bilabial [b], [p]; labiodental [v], [f] –Dental: [  ], [  ] thief vs. them –Alveolar: [t], [d], [s], [z] –Palatal: [  ], [t  ] shrimp vs. chimp –Velar: [k], [g] –Glottal: [?] glottal stop 17

20 Consonants: Place of Articulation What is the point of maximum air restriction? –Approximant: [w], [y] 2 articulators come close but don’t restrict much Somewhere between vowels and consonants lateral: [l] –Tap or flap: [ ] e.g. butter 18

21 Places of Articulation 19 http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html labial dental alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal

22 Consonants: Manner of articulation How is the airflow restricted –Stop (or plosive): [p], [t], [g], … Airflow is completely blocked (closure) and released (release) Glottal stop, e.g. before word-initial vowels in English after a pause. “three even” –Nasal: air is released through the nose [m], [ng] –Frivative: [s], [z], [f] air is forced through a narrow channel, leading to turbulent airflow –Affricates: [t  ] begin as stops, but the release is frivative 20

23 Articulation map 21 PLACE OF ARTICULATION bilabiallabio- dental inter- dental alveolarpalatalvelarglottal stop p b t d k g q fric. f vthdh s zshzh h affric.chjh nasal m nng appr ox wl/r y flapdx VOICING: voicelessvoiced MANNER OF ARTICULATION

24 Vowels All voiced Vowel height –How high is the tongue? High or low? –Where is its highest point? Front or back? How rounded are the lips? mono- [eh] vs. dipthong [ey] –1 vowel sound vs. two 22

25 American English Vowel Space 23 FRONTBACK HIGH LOW ey ow aw oy ay iy ih eh ae aa ao uw uh ah ax ixux

26 Compare to vowel spaces in other languages British English Indian English Swedish Spanish Mandarin Chinese Japanese 24

27 [iy] vs [uw] – “key” vs “coo” 25 (From a lecture given by Rochelle Newman)

28 [ae] vs [aa] – “cat” vs. “cot” 26 (From a lecture given by Rochelle Newman)

29 Acoustic Landmarks 27 [ix] [ih] [ax][ae][iy] [ae][l][p][t][p][t] [p][t] [sh][s] “ Patricia and Patsy and Sally ”

30 Coarticulation The same phone can be produced differently depending on phonetic context. Articulations overlap as articulators move in different timing patterns to to produce consecutive dounsounds –Eight vs. Eighth Articulation moves forward –Met vs. Men Vowel becomes nasalized –Green Banana or “greem” banana? 28

31 Articulator mistiming “Probably” is canonically [p r aa b ax b l iy] –[p r aa b iy] –[p r aw l uh] –[p r ah b iy] –[p r aa l iy] “Sense” is canonically [s eh n s] –[s eh n t s] –[s ih t s] 29

32 IPA Consonants 30

33 IPA Vowels 31

34 Representations for Sounds With ways to represent sounds (IPA, Arpabet, etc.) we can classify and manipulate these units. –Automatic Speech Recognition –Speech synthesis –Speech pathology –Language ID –Speaker ID But…how do we recognize these different sounds automatically from sound data? –Acoustic analysis (digital signal processing) 32

35 Next Class Overview of Spoken Dialog Systems Readings: J&M 24.1, 24.2 33


Download ppt "From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg."

Similar presentations


Ads by Google