aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Acoustic model adaptation for telephone-based speech recognition N. Kleynhans and E. Barnard 27 January 2010.
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Advanced Speech Enhancement in Noisy Environments
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Philip Jackson and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics in a segmental-HMM recognizer using intermediate.
Speech Recognition in Noise
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
Uses of the pitch-scaled harmonic filter in speech processing by Philip Jackson * and Christine Shadle † *School of Electronic and Electrical Engineering,
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
Eng. Shady Yehia El-Mashad
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Multimodal Interaction Dr. Mike Spann
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Basics of Neural Networks Neural Network Topologies.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
Philip Jackson, Boon-Hooi Lo and Martin Russell Electronic Electrical and Computer Engineering Models of speech dynamics for ASR, using intermediate linear.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
High Quality Voice Morphing
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Statistical Models for Automatic Speech Recognition
Final Year Project Presentation --- Magic Paint Face
Blind Signal Separation using Principal Components Analysis
CRANDEM: Conditional Random Fields for ASR
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
Structure-Based Speech Classification Using State-Space Embedding
Analysis of Audio Using PCA
Digital Systems: Hardware Organization and Design
Phoneme Recognition Using Neural Networks by Albert VanderMeulen
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speaker Identification:
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

aperiodic periodic Production of /z/: Covariation and weighting of harmonically decomposed streams for ASR Introduction Pitch-scaled harmonic filter Recognition experiments Results Conclusion

Motivation and aims Most speech sounds are either voiced or unvoiced, which have very different properties: –voiced: quasi-periodic signal from phonation –unvoiced: aperiodic signal from turbulence noise Do these properties allow humans to recognize speech in noise? Maybe, we can use this information to help ASR... by computing separate features for the two parts. Are their two contributions complementary? INTRODUCTION

aperiodic contribution periodic contribution INTRODUCTION Voiced and unvoiced parts of a speech signal Production of /z/:

speech waveform aperiodic waveform s(n) periodic waveform METHOD Pitch-scaled harmonic filter u(n) ^ time shifting v(n) ^ PSHF... optimised pitch f 0 raw f 0 opt pitch optimisation pitch extraction N opt PSHF re-splicing

Original Periodic Aperiodic METHOD Decomposition example (waveforms)

METHOD Original Periodic Aperiodic Decomposition ex. (spectrograms)

METHOD Original Periodic Aperiodic Decomposition ex. (MFCC specs.)

METHOD Speech database: Aurora 2.0 From TIdigits database of connected English digit strings (male & female speakers), filtered with G.712 at 8 kHz. TRAIN TEST

METHOD Description of the experiments Baseline experiment: [base] –standard parameterisation of the original waveforms (i.e., MFCC,+Δ,+ΔΔ) PCA experiments: [pca26, pca78, pca13 and pca39] –decorrelation of the feature vectors, and reduction of the number of coefficients Split experiments: [split, split1] –adjustment of stream weights (periodic vs. aperiodic) Caveat: pitch values were derived from clean speech files, for entire database!

PCA26: PCA78: PCA13: PCA39: MFCC +Δ, +Δ 2 cat PSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA MFCC+Δ, +Δ 2 catPSHF PCA BASE: MFCC waveformfeatures +Δ, +Δ 2 METHOD Parameterisations SPLIT: MFCC+Δ, +Δ 2 catPSHF SPLIT1: MFCC+Δ, +Δ 2 catPSHF

RESULTS Full-sized PCA results

PCA26PCA39 clean + multi RESULTS Variance of Principal Components

PCA26 experiment’s results CLEANMULTI

RESULTS Summary of best PCA results

Split experiment’s results

RESULTS Sample Split results Note: same value of stream weights used in training as in testing, for Split.

Split1 experiment’s results

RESULTS Summary of PCA & Split results

CONCLUSION Conclusions PSHF module split Aurora’s speech waveforms into two synchronous streams (periodic and aperiodic) –large improvements over the single-stream Baseline Split was better than all PCA combinations: –PCA26/13 better than PCA 78/39, and PCA13 best –Split1 marginally better than Split Periodic speech segments give robustness to noise. Further work –Modeling: how best to combine the streams? –LVCSR: evaluate front end on TIMIT (phone recognition). –Robust pitch tracking

COLUMBO PROJECT: Harmonic decomposition applied to ASR Philip J.B. Jackson 1 David M. Moreno 2 Javier Hernando 2 Martin J. Russell