James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators.

Slides:



Advertisements
Similar presentations
Technical Graphics Communication: Multiview Drawings (Part 2)
Advertisements

James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators.
James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators.
Exploiting VERITAS Timing Information J. Holder a for the VERITAS Collaboration b a) School of Physics and Astronomy, University of Leeds, UK b) For full.
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators.
Verification of specifications and aptitude for short-range applications of the Kinect v2 depth sensor Cecilia Chen, Cornell University Lewis’ Educational.
Ultrasound-based tongue root imaging and measurement James M Scobbie QMU With thanks to collaborators Jane Stuart-Smith, Marianne Pouplier, Alan Wrench,
ENS 207 engineering graphics
Rigid Bodies: Equivalent Systems of Forces
Technical Sketching and Shape Description
Computer Vision Lecture 16: Region Representation
Computer Vision – Image Representation (Histograms)
Chapter 26 Geometrical Optics. Units of Chapter 26 The Reflection of Light Forming Images with a Plane Mirror Spherical Mirrors Ray Tracing and the Mirror.
Function of the skeletal system:
Frank L. H. WolfsDepartment of Physics and Astronomy, University of Rochester Physics 121, April 1, Equilibrium.
A Multicamera Setup for Generating Stereo Panoramic Video Tzavidas, S., Katsaggelos, A.K. Multimedia, IEEE Transactions on Volume: 7, Issue:5 Publication.
1Ellen L. Walker Recognizing Objects in Computer Images Ellen L. Walker Mathematical Sciences Dept Hiram College Hiram, OH 44234
Orthogonal Projection and Multiview Representation
Ultrafest 3 U. Arizona, April 2005 Measuring the tongue root: Image dropoff, rotation issues, and the siren call of intrinsic F0 D. H. Whalen Haskins Laboratories.
Lesson 5 Representing Fields Geometrically. Class 13 Today, we will: review characteristics of field lines and contours learn more about electric field.
CS485/685 Computer Vision Prof. George Bebis
Initial testing of longwave parameterizations for broken water cloud fields - accounting for transmission Ezra E. Takara and Robert G. Ellingson Department.
Invariant SPHARM Shape Descriptors for Complex Geometry in MR Region of Interest Analysis Ashish Uthama 1 Rafeef Abugharbieh 1 Anthony Traboulsee 2 Martin.
Pyramid Construction Pyramids Square Rectangle Hex Cone.
Sectional Views Section Views.
ORTHOGRAPHIC PROJECTION
ISOMETRICS Isometric means “equal in measure” and refers to the fact that the three receding axes are tilted at 30°. Isometric drawings are constructed.
Force on Floating bodies:
Copyright © 2010 Pearson Education, Inc. Lecture Outline Chapter 26 Physics, 4 th Edition James S. Walker.
4.10 Reduction of a Simple Distributed Loading
Environmental and Exploration Geophysics I tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
ENTC 1110 OBLIQUE PICTORIALS.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Orthographic Multiview Projection Multiview Projection.
Overview of Mechanical Engineering for Non-MEs Part 1: Statics 3 Rigid Bodies I: Equivalent Systems of Forces.
Chapter 2 Motion in One Dimension. Kinematics Describes motion while ignoring the external agents that might have caused or modified the motion For now,
Anatomical Directional Terms
MOTION RELATIVE TO ROTATING AXES
Dissociating Semantic and Phonological Processing in the Left Inferior Frontal Gyrus PM Gough, AC Nobre, JT Devlin* Dept. of Experimental Psychology, Uni.
1 Artificial Intelligence: Vision Stages of analysis Low level vision Surfaces and distance Object Matching.
Equivalent Systems of Forces
Static Equilibrium and Elasticity
INTRODUCTION: What is the Structure of My Body?
The HESSI Imaging Process. How HESSI Images HESSI will make observations of the X-rays and gamma-rays emitted by solar flares in such a way that pictures.
Projection  If straight lines are drawn from various points on the contour of an object to meet a plane, the object is said to be projected on that plane.
[  ] from [  ] James M Scobbie 2 nd Ultrasound Workshop UBC Vancouver April 2004 lip or lingual vs. lip & lingual.
Digital Logic & Design Instructor: Aneel Ahmed Lecture #1.
INTRODUCTION: What is the Structure of My Body? Human Anatomy The study of the structures that make up the human body and how those structures relate to.
STROUD Worked examples and exercises are in the text Programme 21: Integration applications 3 INTEGRATION APPLICATIONS 3 PROGRAMME 21.
1 James N. Bellinger Robert Handler University of Wisconsin-Madison 11-Monday-2009 Laser fan non-linearity James N. Bellinger 20-March-2009.
Conic Sections There are 4 types of Conics which we will investigate: 1.Circles 2.Parabolas 3.Ellipses 4.Hyperbolas.
Worked examples and exercises are in the text STROUD PROGRAMME 20 INTEGRATION APPLICATIONS 3.
Geometrical Optics.
Midterm Review 28-29/05/2015 Progress on wire-based accelerating structure alignment Natalia Galindo Munoz RF-structure development meeting 13/04/2016.
European Geosciences Union General Assembly 2016 Comparison Of High Resolution Terrestrial Laser Scanning And Terrestrial Photogrammetry For Modeling Applications.
Spatial Visualization Let’s Learn about Spatial Vis!
Geometrical Optics.
Sectional Views Section Views.
ISOMETRIC PROJECTION RATHER DRAWING
Mean Shift Segmentation
Fitting Curve Models to Edges
INTRODUCTION: What is the Structure of My Body?
Volume 54, Issue 6, Pages (June 2007)
Shohei Fujita, Takuya Matsuo, Masahiro Ishiura, Masahide Kikkawa 
ORTHOGRAPHIC PROJECTION
ORTHOGRAPHIC PROJECTION
Sectional Views Section Views.
Presentation transcript:

James M Scobbie CASL Research Centre LOT summer school Ultrasound, phonetics, phonology: Articulation for Beginners! With special thanks to collaborators Jane Stuart-Smith & Eleanor Lawson Joanne Cleland & Zoe Roxburgh Natasha Zharkova, Laura Black, Steve Cowen Reenu Punnoose, Koen Sebreghts Sonja Schaeffler & Ineke Mennen Conny Heyde Alan Wrench (aka Articulate Instruments Ltd) for AAA software and UTI hardware Various funding – thank you to ESRC, EPSRC, QMU June 2013

Structure Introduction to articulation Brief overview of techniques Ultrasound tongue imaging Playtime Technical issues and the nitty gritty of data Maybe a linguistic illustration –Malayalam liquids

Technical issues Different laboratories have different solutions Exemplification will be based around current practice at QMU / Articulate Instruments Ltd Topics (mostly in this order) –Resolution, fixed aspect ratio representations –Up, down and horizontal…the bite plane –Quick averaging multiple tongue surfaces –Statistical testing of difference between averages –Two tongues, synching, de-interfacing –Video-rate vs. (ultra) high speed ultrasound

Spatial resolution around the curve More echo-pulse beams / scanlines means more resolution in a circumferential direction –Let’s assume 1 scanline each 2° (180 in a circle) –Scanlines get further apart the further they are from the probe At 90mm from probe centre, resolution is 3.14mm At 60mm, resolution is 2mm 45mm it is 1.6mm –To maintain these resolutions… A 90° field of view would need 46 scanlines A 135° field of view would need 69 scanlines

Spatial resolution along the radii More sample points means more resolution in a radial direction –8cm depth with 256 sample points = 0.3mm/point Assuming enough pixels to represent each point

°, FoV 135°, 57fps

50 2.7°, FoV 135°, 166fps

Rectangular images The fan shape is presented on a rectangular screen, and occupies a proportion of that space A TV image has a certain number of data points horizontal / vertical (e.g. in NTSC) These are digitised into pixels at a given resolution… Horizonatal in the head is not the same thing as being parallel to the x-axis in the rectangle!

Harrington, Kleber & Reubold 2011 Approximate location of EMA coils in analysis of /u/ fronting in SSBE

Harrington et al Approximate location of EMA coils in analysis of /u/ fronting in SSBE – 2-4mm back/below /i/

UTI single SSE speaker Example of a UTI vowel space, rotated to occlusal bite plane, with average curves (± 1sd) Left pane is standard view, right the UTI view…

Finding the “horizontal” Use a “bite plate” to detect the unique occlusal plane for each speaker, as in typical in EMA Flat plane defined on upper dentition surface Also provides common origin as well as axes Scobbie, Lawson, Cowen, Cleland & Wrench (2012) ms. – I might be able to put this online…

wikipedia In humans, the directions "rostral" and "caudal" often become confused with anterior and posterior, or superior and inferior. The difference between the two is most easily visualized when looking at the head, as can be seen in the image to the right. From the most caudal of positions in the nervous system (of a person) to a nearby, rostral area, it is equally accurate to say the area in question is rostral as to say it is superior. However, in the frontal lobes of the telencephalon, to say an area is rostral to a nearby area is equivalent to saying it is anterior. Those two lines lie on planes perpendicular to one another. This occurs, as becomes clear in the diagram, due to the intuitive yet curious curving "C" shape of rostrocaudal directionality when discussing the human brain.telencephalon

Occlusal biteplane trace bite plate

Variation in bite plane s1s2s3s4s5s6 Occlusal slope -8°-18°-23°-13°-22°-27° Distance from probe surface (mm) Angular offset of rear 92°82°83°77°80°69° Six young adult female speakers Varying slopes (mean 18.5°) Varying vertical offset Varying horizontal offset back of plate

Overlay of 6 hard palates Mean hard palate trace (black) and biteplane trace grey), automatic curve fitting

Palates normalised to bite planes Normalised (translation and rotation) to rear of bite plane and relocation of origin (+45mm) Better palate trace alignment, with one “rogue”

Alternatives Palates can be used to orientate between sessions, by swallowing (e.g. water or yoghurt) Longitudinal, within-speaker –Just line up the palates! –Easy, huh?! Cross-speaker –Might be better than bite plate when worrying about close approximation constrictions –Bite plate might be better for open approximation The probe can be moved instead A consistent articulation can be used, eg [u]

Upright / supine MRI data is collected supine – does it matter? Upright L and supine R “pop” vowel Wrench et al 2011

Summary 6 female speakers, varied accents 5 reps of pep and of pop in randomised list of vowels 4 blocks, repeating upright/supine set twice –Upright first for 3, supine first for 3 Pharyngeal slump under gravity of about 3mm And a couple of cases of blade raising

Averaging tokens within-speaker

Averaging within AAA Averaging along 42 fan-grid radii, “parallel” to scan-lines / echo-pulse beam from the probe

Tokens of [s] from /si/ n tokens along radius r

Tokens of [s] from /sa/ and /si/ vs. a different condition

2 groups of curves t-test of the difference between mean tongue contour at crossing point at each fan line 2-tailed test assuming unequal variances and unequal sample sizes No Bonferroni or other corrections Up to 5 or 6 adjacent radii, mean distance from probe is correlated, perhaps indicating non- independence of such “close” measures For a linguistic interpretation of difference, 5 or 6 adjacent radii, all at p<0.05 on t-test is more important than p< on one radius

Pilot correlation v1 2 speakers, 4 frames each 42 radii per speaker… What % of correlations between two random radii are significant, depending on the distance between them Radial distance Grand mean All parts of tongue pooled More cases of adjacent than long distance comparisons

Pilot correlation v2 A range of 9 varied tongue shapes (9 single frames) from each speaker 4 samples for each frame – roughly equally spaced Is there a correlation for fans 10 apart? 9? 8? 7? … Pilot B (NI1)

3 attempts – more long-distance significance found One sector on the fan is 7 fanlines ABC # fans874

What to try next? Just two sample points per frame, front and back? Pilot 2 A = 9 fans rather than 8 were significant (n=18 observations, so lower values of r were significant) Or one in the middle looking forwards and backwards? Or use many more target types? Or ones that show more subtle differences, such as a set of CV transitions, including every frame, not just varied targets

Tokens of [s] from /sa/ and /si/ Raw tongue curves again

Mean /sa/ vs. /si/ Significant root advancement (~5mm) and palatalisation (~4mm) in /si/ More than 5 adjacent fans where p<0.05, but in 2 areas

Mean /sa/ vs. /si/ SS-ANOVA best fit lines ( ∓ 95% c.i.) - Davidson

Mean /sa/ vs. /si/ Exploring treating >5 fan lines at p<0.05 as categorically “significant” but quantifying it all: –Including crossing/pivot points –Ignore significance if curves are low confidence –Quantify length of the significant tongue surface –Estimate total difference in area

Single speaker (SSE) Neutral space Thick lines for means – cf overlap, non overlap, and crossings

The two tongue problem Wrench & Scobbie (2006) list some of the problems with video-ultrasound resulting from buffering multiple probe scans into one image –More than one scan from the probe in an image –Partial scans from the probe in an image –Don’t forget 30fps is about 33ms, so synch is vague Some solutions, –Use raw probe data (cine loop) but this costs € –Use a high scan rate (more than twice NTSC) and then deinterlace the video to 60fps –Halves vertical spatial resolution (rectangular up)

Video digital capture & buffering The scanner scans and makes screen images

Video digital capture & buffering The scanner scans and makes screen images

Plain 30fps video In these images, two apparent tongues show the effect of two scans in the same buffer, on odd and even video “lines”

Solutions Deinterlace video images to 30fps (16ms or so) “Cineloop” digital output can be stored locally on US scanners –Full rectangular cine images –Approx 15 second chunks –Continuous audio recordings need post-processing alignment AAA / QM Ultrasonix-based system –Data stored as raw probe echo-pulse returns –Synchronised at source with audio at each frame –Video channel freed up, and can be used to capture lip videos

High speed 76 scan lines, 100fps, FoV 112°

Ultra-high speed 39 scan lines, 196fps, FoV 112°

Ultra-high speed 25 scan lines, 306fps, FoV 112°

“dog” – ultra high speed 382fps time g ɔ d backfront

“tongue blade height” during [d] 382fps & 60fps, 300ms –Constriction-tracking, comparable to but different to flesh-point tracking

Demo videos Video demo, deinterlaced lip camera 60fps [folder]folder –UTI old dutch and labialised english r [link]link –Lip ultrax kids [link] – deinterlaced ring [link]link High speed UTI 100fps –Malayalam retroflex lateral [folder]folder Slomo [link]link Slomo with spline [link]link Real speed with spline [link]link

Single frame targets Two darker (tongue root) liquids, L / ɭ / and R /r/ Three clearer (ATR, ~pal’ised) l /l/, r / ɾ /, 5 zh

High speed (100fps) Malayalam trill /r/ R between /a/ –Left = closing half of gesture –Right = opening half Note trill motion in blade and stable root

High speed (100fps) Malayalam tap / ɾ / between /a/ Note greater movement in root, pivot point

High speed (100fps) Malayalam retroflex flap / ɭ / Stable root, mobile blade, slower approach with very fast release (nb some UTI artefacts) of over 400mm/sec peak velocity

Gestural speed Unlike EMA, it’s hard to quantify kinematics Need to explore / compare with EMA

Root stability? Positional examinations are easier Retroflex flap and trill both have a very stable root, which could be due to –Posterior bracing to enable the anterior movement –Coincidental, because the context was /a__a/ and these liquids have a dark resonance in Malayalam We can compare /a__a/ to /a__i/

Retroflex lateral flap in a__a Green = prev vowels and formation of maximally retracted “target” (black) Red = during the flap Purple = afterwards

Green = prev vowels and formation of maximally retracted “target” (black) Red = during the flap Purple = afterwards Retroflex lateral flap / ɭ / in a__i High spatial accuracy when orthogonal to beam Lower spatial accuracy when parallel to beam

Comparison Overlap: during period from target to acoustic transition –dark aLa –light aLi How should we to quantify? No sig difference anteriorly but…?

Single frame targets Two darker (tongue root) liquids, L / ɭ / and R /r/ Three clearer (ATR, ~pal’ised) l /l/, r / ɾ /, 5 zh

Next to an /i/ vowel Two darker (tongue root) liquids, L / ɭ / and R /r/ Root advancement and some palatalisation

Next to an /i/ vowel Three clearer (ATR, ~pal’ised) l /l/, r / ɾ /, 5 zh Root advancement and more palatalisation

Summary Support for Punnoose’s acoustic findings of dark vs. light resonances in the liquid system, –Tongue root –Palatal dorsal area Apparent tongue-root bracing for trill and retroflex lateral flap in an /a_a/ context is associated with these being dark consonants –There is steady dynamic root coarticulation in /a_i/ Both light and dark liquids coarticulate but don’t overlap

Any time for any more?

ULTRAX – adding missing pieces Ultrasound misses a great deal of information! ULTRAX project to obtain corpus of 12 speakers in MRI / UTI to build real-time model Renals & CSTR

Could be used for head-movement correction within the midsagittal plane and/or Analysis of lip kinematics Headset-mounted camera

Estimate based on oval model of internal 2D labial aperture, 60fps (~17ms per frame) Coronal “cross-sectional area”

Orientation-free measures Measures of curviness of the tongue may escape the image-orientations problem Mielke’s concavity & Zharkova’s dorsal bulge (and others) offer speaker-internal unoriented analysis –But there is a worry about front/back of tongue being needed, since end-points can be arbitrary

How do /t/ and /k/ differ? For a variety of work, it is nice to compare a speaker’s productions against a kind of norm ULTRAX group 1 corpus of 30 children offer useful dataset –ata, iti, oto vs. aka, iki, oko –Speaker-internal ratio of /k/-/t/, along fanlines –Should show extra dorsal distance in /k/ and extra alveolar distance for /t/

results For example, /k/-/t/ of /a/ can be averaged by lining up the maximum excursion point These are not tongue surfaces! Nor in a fan!

Results – anterior to left