Presentation is loading. Please wait.

Presentation is loading. Please wait.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Similar presentations

Presentation on theme: "A System for Hybridizing Vocal Performance By Kim Hang Lau."— Presentation transcript:

1 A System for Hybridizing Vocal Performance By Kim Hang Lau

2 Parameters of the singing voice  Parameters of the singing voice can be loosely classified as: –Timbre –Pitch contour –Time contour (rhythm) –Amplitude envelope (projections)

3 Vocal Modification  Vocal modification refers to the signal processing of live or recorded singing to achieve a different inflection and/or timbre  Commercially available units include –Intonation corrector –Pitch/formant processor –Harmonizer –Vocoder

4 Objectives  Prototype a system for vocal modification  Modify a source vocal sample to match the time evolution, pitch contour and amplitude envelope of a similarly sung, target vocal sample  Simulates a transfer of singing techniques from a target vocalist to a source vocalist – thus a hybridizing vocal performance

5 Order of Presentation  System Overview  Individual components  System evaluation  System limitations  Conclusions and recommendations

6 System Overview  Three components –Pitch-marking –Time-alignment –Time/pitch/amplitude modification engine  Inspired by Verhelst’s prototype system for the post- synchronization of speech utterances

7 Targeted System Specifications Vocal performanceCommercial singing Vocal pitch range60-1200 Hz Detection accuracy/resolution10 cents Detection dynamic range40dB Sampling rate44.1kHz and 48kHz Time-scale modification±20% Pitch-scale modification±600 cents

8 Component No.1 Pitch-marking

9 Pitch-marking and Glottal Closure Instants (GCIs)  Information generated from pitch-marking –Pitch period –Amplitude envelope –Voiced/unvoiced segment boundaries Pitch-marks 5ms PP’P’

10 Pitch-marking applying Dyadic Wavelet Transform (DyWT)  Kadambe adapted Mallat’s algorithm for edge detection in image signal to the detection of GCIs in speech signal  He assumed the correlation between edges in image signal and GCIs in speech signal  DyWT computation for dyadic scales 2^3 to 2^5 was sufficient for pitch-marking  If a particular peak detected in DyWT matches for two consecutive scales, starting from a lower scale, that time-instant is taken as a GCI

11 MallatKadambe Original Signal 2^1 2^22^3 2^42^5 Base-band

12 The proposed pitch-marking scheme  Detection principle –Detection of the scale that contains the fundamental period –Starting from a higher scale (of lower frequency), there is a considerable jump in frame power when this scale is encountered  Features –4X decimation to support high sampling rates –Frame based processing and error correction for possible quasi-real-time detection

13 The proposed pitch-marking system

14 Comparisons of results with Auto-Tune Proposed systemAuto-Tune

15 Component No.2 The Modification Engine

16  (n): time-modification factor  (n): pitch-modification factor  (n): amplitude modification factor D(n): time-warping function  (n)  (n)  (n) D(n) Time/pitch/amplitude modification engine

17 TD-PSOLA (Time-domain Pitch Synchronous Overlap-Add)  Time-domain splicing overlap-add method  Used in prosodic modification of speech

18 Evaluation of the modification engine Original TD-PSOLA Auto-Tune

19 Component No.3 Time-alignment

20 Time-alignment  Based on Verhelst’s prototye system that applies Dynamic Time Warping (DTW)  He claimed that the basic local constrain produces the most accurate time- warping path  Exponential increase in computation as length of comparison increases  Accuracy deteriorates as length of comparison increases

21 Adaptations from Verhelst’s method  Proposed to perform time-alignment on a voiced/unvoiced segmental basis –DTW for voiced segments –Linear Time Warping (LTW) for unvoiced segments  Global constraints are introduced to further reduce computations  Synchronization of voiced/unvoiced segments are required, which is manually edited in current implementation

22 Manipulation of modification parameters  Simple smoothing of  (n),  (n) using linear phase FIR low-pass filters are performed before feeding them to the modification engine

23 The Prototype System

24 System Evaluation: case 1

25 System Evaluation: case 2

26 System Limitations  Segmentation –Lack of a reliable technique for voiced/unvoiced segmentation –Segmentation and classification of different vocal sounds is the key to devise rules for modification  Modification engine –Lack capabilities to handle pitch transition, total dependence to the pitch-marking stage

27 System Limitations  Pitch-marking –Proposed system lacks robustness –Despite desirable time-response of the wavelet filter bank, its frequency response is not capable of isolating harmonics effectively and efficiently  Time-alignment –The DTW basic local constraint allows infinite time expansion and compression. –This factor often causes distortions in the synthesized vocal sample

28 Conclusions and Recommendations  Current systems works well for slow and continuous singing  Further improvements on the individual components are recommended to handle greater dynamic changes of the vocal signal, thereby extending the current good results to a wider range of singing styles

29 Questions & Answers

30 Wavelet filter bank

31 Dyadic Spline Wavelet

32 Wide-band analysis

33 DTW local constraints

34 Calculation of pitch-marks

35 DyWT

Download ppt "A System for Hybridizing Vocal Performance By Kim Hang Lau."

Similar presentations

Ads by Google