Speech signal processing & its real-time implementation

Name: Speech signal processing & its real-time implementation
Uploaded: 2017-08-23T17:46:07+00:00
Duration: PTM8S47
Channel: Alaina Holland
Description: Speech signal processing & its real-time implementation

Speech signal processing & its real-time implementation
20st November

Goals & objectives Introducing basic speech signal processing
Speech production model Vocal tract transfer function Fundamental frequency Basic parameter estimation Linear predictive coefficients (LPC) Pitch/Voicing status analysis Real-time implementation LPC analysis Estimation of vocal tract transfer function Pitch estimation

What is “speech signal” ?
Physical definition: Signals produced by human speech production organs Lung, Larynx, Pharyngeal cavity, Oral cavity, Lip, Tongue, Nasal cavity Informational definition: Context + personality

Speech production mechanism

Speech production model
vocal tract transfer function excitation signal speech signal

Examples of vocal tract MR images
‘a’ of ‘matt’ ‘i’ of ‘vit’ ‘fi’ of ‘fiffig’ ‘j’ of ‘jord’

Primitive speech synthesizer
In 1779 Russian Professor Christian Kratzenstein made apparatus to produce five vowels (/a/, /e/, /i/, /o/, /u/) artificially

Von Kempelen's speaking machine (1930)

Digital model for speech production
Excitation model Vocaltract model

Model for each stage Excitation model -> Impulse train generator
Vocaltract model -> All-pole linear time-varying filter (IIR digital filter)

Formant frequency Vocal tract 의 공진(resonance)에 의해 발생되는 tonal 성분 (공진주파수) 1st 2nd 3rd Vocal tract transfer function

Waveforms & Formant frequency

Vocal tract transfer function
음성이 발생되는 기본 요소 Context-dependent : 음성 인식에 사용 Speaker-dependent : 화자 인식에 사용

Related signal processing theories (Z-transformation)
연속 신호 이산 신호 F.T. L.T. DFT Z-transform

Pole & Zero

Estimation of VTF Vocal tract AR (Auto-Regressive) Model
Tube-like shape Resonance frequency를 갖음 All-pole modeling 가능 즉, AR (Auto-Regressive) Model LPC (선형예측계수)

VTF from AR-Model 즉, 선형예측계수{ak}로 부터 VTF를 추정할 수 있다.

선형예측계수(LPC)의 추정

The Durbin’s Recursion
… … … … … … … …

From LPC to magnitude response of VTF

실습-1 한 frame에 대한 autocorrelation 을 계산
Autocorrelation 으로 부터 Durbin’s recursion algorithm을 이용하여 LPC 계산 LPC로 부터 VTF의 magnitude response를 계산 입력된 음성신호에 따른 VTF magnitude response 를 파형과 연동하여 display 입력 신호는 마이크로폰 또는 배포된 음성신호(clean)를 play하여 사용 Frame 길이는 자유롭게 하되 256 sample, frame 이동은 128 sample권장 CCS의 profile 기능을 이용하여 CPU load를 살펴보고 샘플링 주파수 (8kHz, 32kHz …) 에 따른 real-time processing 가능 여부를 확인. Real-time failure가 발생한 경우 계산량을 절감하는 방법 고안

Flow chart Filled ping or pong buffer Frame 구성 Autocorrelation 계산
두 신호간 를 연동하여 plot, 비교 Durbin’s recursion algorithm Vocal tract transfer function 계산 다음 Frame 으로 이동

Example of results

VTF 계산시 계산량

FFT를 이용한 VTF의 계산

LPC-to-VTF without FFT
예제 code LPC-to-VTF without FFT for (i=0; i<FFTLEN/2; i++) { factor=3.14*((float)i/(FFTLEN/2-1)); rs=0.0; is=0.0; for (j=0; j<LPCORD; j++) { rs+=lpc[j]*cosf(factor*(j+1)); is+=lpc[j]*sinf(factor*(j+1)); } rs+=1.; H[i]=1./sqrtf(rs*rs+is*is); LPC-to-VTF with FFT for(i=1; i<FFTLEN; i++) { tfftb[i<<1]=(i-1 <LPCORD)?lpc[i-1]:0; tfftb[(i<<1)+1]=0; } DSPF_sp_cfftr2_dit(tfftb, ffttw, FFTLEN); DSPF_sp_bitrev_cplx(tfftb, brv, FFTLEN); for(i=0; i<FFTLEN/2; i++) { rs=tfftb[i<<1]+1; is=tfftb[(i<<1)+1]; H[i]=1/sqrtf(rs*rs+is*is);

실습-2 DSPLIB에서 제공되는 FFT함수를 이용하여 VTF의 magnitude response를 구함
실습-1과 동일한 결과가 나오는지 check Profile 시 실습-1에 비교하여 얼만큼 계산량이 줄어드는지 확인 실습-1에서 불가능했던 샘플링 주파수에 대해서도 가능여부 확인

Part II Pitch estimation

What is “pitch” 여기신호의 펄스 간격 음성의 높낮이를 결정
기본 주파수 (fundamental frequency, F0) = 1/pitch 음성의 높낮이를 결정 간격이 좁을 때 : 높은 음성 (female) 간격이 넓을 때 : 낮은 음성 (male)

Pitch 추정 – 자기상관함수 이용 신호의 주기와 피크간 거리가 비슷

Autocorrelation for voiced/unvoiced speech signals

Usefulness of Autocorrelation
Pitch Time-lag of the first peak = pitch period Voiced/Unvoiced Relative intensity of the first peak If sufficiently high -> voiced frame Otherwise -> unvoiced/silence frame LPC computation Necessary parameters for LP-analysis

Pitch estimation시 고려사항
Pitch doubling problem 주기가 T라면 2T, 3T, .. 도 주기가 됨 Autocorrelation 의 peak위치도 주기적으로 나타남

Plot of the autocorrelation peaks
Pitch doubling Pitch halving

Pitch estimation 시 고려사항
Median filter 사용 N개의 값 중 중간값을 택함 Ex) 100, 110, 120, 122, 230, 123, 121, 119, … 110, 120, 122, 123, 123,

Flow chart Filled ping or pong buffer Frame 구성 Autocorrelation 계산
Rmax/R(0) > 0.64 ? Unvoiced frame Rmax에 해당하는 i값 = pitch Pitch 값에 대한 3-tap median filtering 다음 Frame 으로 이동

실습-3 실습1에서 작성한 autocorrelation 함수를 사용하여 pitch를 추정하는 프로그램을 작성
현재 입력된 음성을 voiced/unvoiced 결정하는 프로그램 작성 권장: R(t)/R(0) > 0.64인 경우 voiced로 판정 구한 피치값을 입력된 파형과 연동하여 display 입력 신호는 마이크로폰 또는 배포된 clean음성 신호를 player에서 연속 재생

Speech signal processing & its real-time implementation

Similar presentations

Presentation on theme: "Speech signal processing & its real-time implementation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech signal processing & its real-time implementation

Similar presentations

Presentation on theme: "Speech signal processing & its real-time implementation"— Presentation transcript:

Similar presentations

About project

Feedback