Presentation on theme: "James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators."— Presentation transcript:
James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators Jane Stuart-Smith & Eleanor Lawson Joanne Cleland & Zoe Roxburgh Natasha Zharkova, Laura Black, Steve Cowen Reenu Punnoose, Koen Sebreghts Sonja Schaeffler & Ineke Mennen Conny Heyde Alan Wrench (aka Articulate Instruments Ltd) for AAA software and UTI hardware Various funding – thank you to ESRC, EPSRC, QMU June 2013
Structure Introduction to articulation Brief overview of techniques Ultrasound tongue imaging Playtime Technical issues and the nitty gritty of data Maybe a linguistic illustration –Malayalam liquids
Why study articulation?
It underlies acoustic and visual elements of speech, but is only a means to an end It tells you about what speakers actually do It might be what the speaker intends to control Some bits of speech are silent It is interesting in its own right because it is a complex multichannel linguistic phenomenon Applications – clinical, military, HCI, L2 Comparison with sign language and gesture Provides evidence for phonological patterns
Easy topics for research Silent articulations –Pre-speech, post-speech –Speech errors –Voiceless stops –Listening, turn taking Covert contrasts –In acquisition therapy, L2 learning, sociolinguistics Covert errors –In acquisition, therapy, L2 learning Articulation / acoustics relationship in segmental and prosodic speech
Imaging Video (including basic photos/still frames) –Regular video (25fps PAL or 30fps NTSC), often underlyingly a higher image / refresh rate (de-interlaceable) –From cheap camcorders to endoscopy for internal images X-ray stills and X-ray cinematography MRI Magnetic Resonance Imaging & CT scanning –Excellent resolution of superficial and deep features in 3D for static images –Bigger voxels, more grainy, more processing at faster frame rates, great prospects in next few years UTI Ultrasound Tongue Imaging –Regular video outputs or –Digital (“high speed”) cineloop outputs
What hi-tech speakers go though EMA: gluing and sitting in the cube: 2h+ data UTI: probe fitting and the headset: “30m” data –Short sessions, outputs are image sequences, needs synchronisation, captures root
Vid UTI: ECB08 spontaneous dialogue MRI: USC Video playback in powerpoint / media players don’t really convey the spatio-temoral nature of the data
What about something easier? What do you do when you “say hello”…? Description…?
MRI Same speaker?
Video Video can tell you a lot What do you do when you “say hello”…?
Drawbacks are not unique Easy-to-get images are only 2 dimensional –The head and vocal tract are 3D objects –Which 2 dimensional plane do you want to study? Speaker & camera move relative to each other –False motion of articulators within the plane –Towards or away from the camera, changing scale –And rotations mean a different plane is shown Not many frames per second –Potential for smearing in time –Missing key events completely –Weak and/or variable synch with acoustics
3D / 4D? To get data in more than one plane, let alone enough to make a 3D image that moves in time… … means sacrifices –Lower spatial resolution –Lower temporal resolution
Ultrasound Tongue Imaging
UTI applications Ultrasound as a tongue imaging technique Relatively cheap, non-invasive and accessible –Fieldwork –Clinical diagnostics –Child language acquisition –Standard laboratory phonetics & phonology Real time visual biofeedback –Phonetics and linguistics teaching –Clinical intervention –L2 teaching and personal training
hand held & live video-mode Quick, portable, cheap, live/realtime, “comfyish” Synchronisation with audio, probe movement Applications –Clinic –Teaching –Piloting –Outreach –Fieldwork –Discourse –Infants –pT Research!
Articulate Assistant Advanced (AAA) ~120fps hs-UTI: raw probe echo-location data is stored and re-imaged on the fly –Up to ~400fps available 135° Field of View ~60fps de-interlaced lip camera stored as uncompressed bitmap QMU AAA multichannel lab set-up
and the AAA multichannel system MC suburb
AAA Data collection and analysis in a fast single dedicated software environment –Ultrasonix high speed UTI (no post-processing) –Various video UTI or camcorder systems –Same annotation & display software for EPG, EMA Custom-made multichannel synchronisation –Video via “synch-brightup” clapperboard on A D of video images, with built in batch-processing De-interlacing from ~30 to ~60-fps Offset of clapper-frame to adjust for UTI creation (~20ms) Semi-automatic edge-detection –Smoothing & confidence-rating over 42 points
Some problems Ultrasound gives rise to –Artefacts from parallel tongue & echo pulse beams –Missing data Between scan lines, or beyond the scan area Behind bones Above a sublingual cavity (aka losing the tip) –Grainy data or poor resolution Older speakers, dry mouth, beards, etc. Tongue surface when it’s far from the probe When tongue is parallel to the scanlines Problems with stabilisation & synchronisation –Technical solutions can lead to speaker fatigue
With UTI… We only have mid-sagittal tongue curves –Not passive articulators –Not all the tongue surface –Not all the internal tongue tissue –Not lips But unlike EMA –We are not limited to 3 or 4 anterior points And unlike MRI –UTI is cheap, non-invasive, portable and quick –For small datasets analysis is quick… we can collect & trace 12 tokens of 5 vowels in half a day
Playtime! Real time visual biofeedback Have a go… Front/back vowels Dutch & English /r/ Dutch & English /s/ and English “sh” Swallowing Annotation and analysis software AAA Basic overview, using some good data Annotation and filtering Drawing some splines