Presentation is loading. Please wait.

Presentation is loading. Please wait.

USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch.

Similar presentations


Presentation on theme: "USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch."— Presentation transcript:

1 USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch *, Stephen Tobin ^, Dani Byrd ^, Krishna Nayak *, Jon Nielsen * * USC Viterbi School of Engineering ^ USC Department of Linguistics Supported by NIH. Our thanks to the USC Imaging Science Center.

2 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Background: Singing Acoustics Signal characteristics of the singing voice J. Sundberg, “The Acoustics of Singing Voice,” Scientific American 236, 1977. Measure vocal tract resonances with external excitation Show tuning of F1 to F0 for softly sung vowels E. Joliveau, J. Smith, and J. Wolfe, “Vocal tract resonances in singing: The soprano voice,” JASA 116, Oct. 2004. Estimating formants at high pitch from audio waveform is problematic H. Traunmueller, A. Erikson, “A method of measuring formant frequencies at high fundamental frequencies,” Proc. EuroSpeech’97, Vol.1:477-480.

3 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Problem statement Long term project goal: Investigate relation between vocal tract shaping and source control in sung and spoken productions Specific focus: Soprano challenge investigate vocal tract shaping for different vowels with increasing pitch

4 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Data collection Subject in MRI scanner in supine position for approx. 60min Soprano, trained western opera singer sang various 30s pieces spoke utterances “la”, “le”, “li”, “lo”, “lu” (3 realizations each) Sang two-octave b-flat major scales “la”, “le”, “li”, “lo”, “lu” (one realization each)

5 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Real-time MR imaging GE 1.5T scanner Custom head/neck receiver coil RTHawk software Santos et al., Proc. IEEE EMBS, 26th Annual Meeting 13 interleaf spiral pulse sequence TR = 6.5ms true frame rate 11fps sliding window reconstruction 22fps slice thickness approx. 5mm, mid-sagittal plane resolution approx. 3mm/pixel resulting image size 68x68 pixels

6 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Synchronized audio acquisition Phone-OR optical microphone Laptop with National Instruments 16bit DAQ card Sampling rate 100kHz (5x oversampling) Custom FPGA-based sync hardware

7 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Synchronized audio acquisition Offline gradient noise cancellation employs adaptive FIR filter and normalized LMS algorithm achieves approx. 30dB SNR improvement

8 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis Manual tracking of MR images vocal tract outline for each individual frame from larynx to lips Computation of midline finding start and end point at larynx and lips repeated recursive bi-section smoothing spline fit

9 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis Find aperture cross sections perpendicular to smooth midline Computation of final midline along midpoints of cross sections coordinate system based on midline coordinate origin to be anchored in the future to anatomical landmark, currently above epiglottis

10 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis Final result: Aperture function with midline-based coordinate system front back constriction degree (aperture)

11 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Acoustic analysis Using real-time noise-cancelled audio Pitch estimation using PRAAT Format analysis using PRAAT for spoken utterances and for low pitch notes Formant analysis difficult for high pitch utterances (example /i/ on next slide)

12 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Acoustic analysis Formant analysis from audio is difficult at high pitch. “li” note 1 (F0 = 233Hz), note 5 (F0 = 349Hz): clear formant structure “li” note 11 (F0 = 622Hz), note 15 (F0 = 932Hz): formants are harder to identify F0=233 Hz F0=349 Hz F0=622 Hz F0=932 Hz 5 kHz Sung Vowel: /i/

13 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Aperture function analysis At high pitches, the acoustic identity of the vowels are “sacrificed,” i.e. they converge acoustically. However, in their articulation... while the front half of the aperture function converges for all vowels at high pitch, the back half of aperture function maintains a vowel-dependent shape.

14 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Aperture functions for vowels sung at different pitches F0 = 233Hz F0 = 349Hz F0 = 622HzF0 = 932Hz

15 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Larynx position analysis results Larynx raising with higher pitch for /e/, /i/, /o/, /u/ pitch increasing ------------------------->

16 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Vocal tract length analysis results Vocal tract length decreases with pitch for /e/, /i/, /o/ spoken vowels pitch increasing ------------------------->

17 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Minimum aperture analysis results Minimum aperture value increases with pitch for all vowels Minimum aperture location varies with pitch for /a/, /o/ spoken vowels pitch increasing -------------------------> spoken vowels

18 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis examples /a/ F0 = 932Hz Note 15 F0 = 622Hz Note 11 F0 = 349Hz Note 5 F0 = 233Hz Note 1 spoken

19 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis examples /i/ F0 = 932Hz Note 15 F0 = 622Hz Note 11 F0 = 349Hz Note 5 F0 = 233Hz Note 1 spoken

20 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Sung vowels Resonance tuning can be shown for vowels with low F1.

21 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Vocal tract shapes in comparison /a/ /e/ /i/ /o/ /u/ spoken F0 = 233Hz F0 = 932Hz

22 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Aperture functions in comparison /a/ /e/ /i/ /o/ /u/ spoken F0 = 233Hz F0 = 932Hz

23 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Discussion Several challenges in analysis: Vocal tract resonances are difficult to estimate from the acoustic output at high pitch. We plan in the future to estimate vocal tract resonances from MR-derived area function data. cf. Joliveau et.al. estimated resonances directly by acoustic methods 3 jointly controlled goals: pitch: critical goal; not compromised as evidenced in audio one component of implementation: raised larynx (except low vowel) intensity: another important goal; increases with pitch one component of implementation: open front cavity/cone effect vowel identity: acoustic identity lost at high pitches front cavity shaping compromised, but back cavity distinction still maintained; effect depends on vowel (low vs. high for example)

24 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Discussion Strategy for joint control and relative weighting of the goals is unknown. It appears that vowel identity is compromised but not completely ignored at high pitch. Joliveau et.al. data acquired at soft intensity: opening of front cavity for cone effect may have been minimized Generalizability of results limited Need: Data from more subjects needed and direct acoustic modeling for estimating vocal tract resonances Ongoing work: We have collected data from 5 more sopranos.

25 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis examples /a/ Some /a/ images: F0 = 932Hz F0 = 622Hz F0 = 349Hz F0 = 233Hz spoken

26 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Image analysis examples /i/ Some /i/ images: spoken F0 = 932Hz F0 = 622Hz F0 = 349Hz F0 = 233Hz

27 USC Speech Articulation and kNowldege (SPAN) Group sail.usc.edu/span Pitch and power estimation Average power increases with pitch. Pitch follows the nominal values very closely.


Download ppt "USC Linguistics Resonance tuning in soprano singing and vocal tract shaping: Comparison of sung and spoken vowels 2pSC29 Shrikanth Narayanan *^, Erik Bresch."

Similar presentations


Ads by Google