Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Speech Recognition Chapter 3
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Speech and Audio Processing and Recognition
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Representing Acoustic Information
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
T – Biomedical Signal Processing Chapters
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Structure of Spoken Language
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Sound Waveforms Neil E. Cotter Associate Professor (Lecturer) ECE Department University of Utah CONCEPT U AL TOOLS.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.
Project-Final Presentation Blind Dereverberation Algorithm for Speech Signals Based on Multi-channel Linear Prediction Supervisor: Alexander Bertrand Authors:
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Topic: Pitch Extraction
Speech Processing Laboratory, Temple University May 5, Structure-Based Speech Classification Using Nonlinear Embedding Techniques Uchechukwu Ofoegbu.
Linear Prediction.
Speech Recognition with Matlab ® Neil E. Cotter ECE Department UNIVERSITY OF UTAH
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Bayesian Enhancement of Speech Signals Jeremy Reed.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
High Quality Voice Morphing
Figure 11.1 Linear system model for a signal s[n].
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Microcomputer Systems 2
Linear Predictive Coding Methods
The Vocoder and its related technology
ESTIMATED INVERSE SYSTEM
Linear Prediction.
Homomorphic Speech Processing
EE Audio Signals and Systems
Speech Processing Final Project
Auditory Morphing Weyni Clacken
Presentation transcript:

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project December 7, 2010

Pitch synchronous windowing is a critical part of many speech processing algorithms Homomorphic filtering, for example, is based on the principle that the pitch frequency may be “liftered” from the vocal tract response via simple subtraction Linear prediction based signal reconstruction simpler with Pitch synchronous windowing covariance method need pitch synchronous glottal closed portion of the speech.

Window selection for overlap and add reconstruction Bartlett, simple triangle Hann raised cosine types Hamming raised cosine types

Bartlett window overlap and add response

Hann overlap and add response

Hamming window overlap and add response

Blackman-Harris overlap and add response

Window selection based on “spectral leakage” and frequency resolution

Hann Window

Hamming window

Blackman-Harris

Window over lap and add Frame Rate verses Frame length considerations

Linear Prediction wide search pitch period estimation Single 12 th order all pole model Voiced speech is contained within the sample window Use inverse filtering to get glottal pulses Take autocorrelation of the residual to determine pitch period

Male speak “Moon”

Female speaker “Moon”

Male voice

Female voice

Male residual

Female residual

Autocorrelation of Male

Autocorrelation of female

Synthesize Male single model

Male single model

Female single model

Pitch synchronous Processing Segment speech waveform so that the frame length is 3 pitch periods. Make sure the window length is even. Set the Hamming window length to frame length and the frame rate to ½ the frame length Generate a 12 pole LP model for each frame. Inverse filter each frame and save the AR model coefficients and residual in a matrix, where each row is a residual. Take the autocorrelation of the residual of the frame. Find the autocorrelation peak. Determine the pitch period for each frame based on the autocorrelation of the residual of the frame. If the frame does not have a valid pitch period, determine if the frame is fricative or plosive. If the variance of the autocorrelation is low, the frame is fricative. Otherwise the frame is plosive. Save the pitch period for each frame in a vector along with the peak of the autocorrelation as well as the fricative or plosive status. Reconstruct the frame by filtering the residual with the AR coefficients, or synthesize the waveform by estimating the glottal pulse train, adding impulsive fricative noise or a single impulse for plosive frames. Over lap and add segments to reconstruct the signal. Compare to the original speech using SSE

Overlap and add Reconstruction male

Overlap and add female

Overlap and add reconstruction male

Overlap and add reconstruction female

Reconstructed Male

Reconstructed female

Conclusions Many speech processing applications use a combination of windowing and overlap and add for signal resonstruction Pitch synchronous windowing necessary for accurate results in speech processing. Homomorphic deconvolution requires it. A single set of coefficients for a single voiced sound appears to be a reasonable approach Pitch period estimation, is extracted from the residual of the inverse filtered voiced sound through the autocorrelation function Pitch synchronous windowing a good foundation for all type of signal processing applications