Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.

Slides:



Advertisements
Similar presentations
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Advertisements

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
G.S.MOZE COLLEGE OF ENGINNERING BALEWADI,PUNE -45.
HCI : Speech /Speaker Recognition System
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Natural Language Processing - Speech Processing -
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
COMP 4060 Natural Language Processing Speech Processing.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
Eng. Shady Yehia El-Mashad
Speech Signal Processing
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Speaker independent Digit Recognition System Suma Swamy Research Scholar Anna University, Chennai 10/22/2015 9:10 PM 1.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Signal Processing I
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
1 Speech Processing. 2 Speech Processing: Text:  Spoken language processing Huang, Acero, Hon, Prentice Hall, 2000  Discrete time processing of speech.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Lecture 1 Phonetics – the study of speech sounds
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Course Name: Speech Recognition Course Number: Instructor: Hossein Sameti Department of Computer Engineering Room 706 Phone:
Digital Signal Processing Rahil Mahdian LSV Lab, Saarland University, Germany.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Artificial Intelligence for Speech Recognition
Digital Speech Processing
Digital Speech Processing
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Kocaeli University Introduction to Engineering Applications
Voice source characterisation
Digital Systems: Hardware Organization and Design
Music Signal Processing
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur

Objectives of the course Illustrating usefulness of signal processing tools for the analysis and processing of speech. Highlighting efficient utilization of communication resources by exploiting the speech production and perception properties. Explaining various issues involved in the processing of speech for human-computer interaction. Describing existing speech processing systems like speech recognition, speaker recognition and text-to- speech synthesis. Giving exposure to the research issues involved in the speech processing area.

Need for speech processing Speech : Natural mode of communication among human beings Message, Speaker information and Language information Language constraints : Legal sound units Legal sequence of legal sound units Why speech processing ? Man Machine Man M STT TTT TTS M M/C

Need for human-machine interface Automatic dictation Voice response systems Voice based person authentication system Forensic investigation application Language identification Information (data) retrieval (voice-based) Speech-to-speech conversion applications Speech enhancement Speech coding

Speech tasks required for M/C interface Speech Recognition : Speech-to-Text Speech Synthesis (TTS) : Text-to-Speech Speaker Recognition : Speech – Speaker identity Language Identification : Speech – Language identity Speech Enhancement : Noisy speech – Clean speech Speech coding : Speech – Encoder – Channel – Decoder – Speech

Features for various speech tasks Features to characterize sound units Features to characterize a speaker Features to characterize the language Features to represent the articulator movements Features to characterize speech, nonspeech, noise and reverberation Features for coding and reproduction of speech

Scope of the Course Introduction (1) Acoustic Phonetics (3) Speech signal processing methods (4) Speech signal analysis approaches (8) Modeling techniques for developing speech systems (12) Speech systems (12)

Scope of the Course (Cont..) Introduction Acoustic phonetics –Classification of sound units –Production aspects –Time and frequency domain realizations –Excitation and vocal tract system characteristics Processing of speech signals –Spectral domain representation –Source-system representation –DFT and its properties –Pole-zero realizations

Scope of the Course (Cont..) Speech analysis methods –Filterbank analysis –Time-domain features of speech signal –Linear Prediction (LP) analysis of speech –Cepstrum analysis –Sinusoidal analysis of speech –Harmonic plus Noise model (HNM) analysis of speech –Group-delay analysis of speech

Scope of the Course (Cont..) Models used for developing speech systems –Introduction to statistical pattern recognition Classification strategies Probability density estimation techniques –Vector quantization (VQ) –Gaussian Mixture Model (GMM) –Hidden Markov Model (HMM) –Neural Networks (NN) –Support Vector Machines (SVM)

Scope of the Course (Cont..) Speech systems –Speech coding –Speech recognition –Speaker recognition –Speech synthesis –Language recognition –Speech enhancement

Assignments 1.Familiarity with speech recording, playback and editing software 2.Effect of sampling and quantization 3.Recording and analysis of speech sounds 4.Time domain analysis of speech 5.Spectral analysis of speech using STFT 6.Spectral analysis using different windows 7.Sinusoidal analysis/synthesis of speech 8.Linear predication analysis/synthesis of speech 9.Cepstral analysis of speech 10.Estimation of pitch and formants from speech 11.Synthesis of vowels 12.Development of prototype speech recognition system

Assignments (Cont..) 13.Voiced/unvoiced (speech/nonspeech) detection (Energy, ZCR, ACF, AMDF) 14.Vowel recognition using Filter-bank approach 15.Vowel recognition using ZCR (realization of filter-bank using ZCR) 16.Vowel recognition using: vector quantization GMM HMM Multivariate Gaussian distribution K-Nearest Neighbour ANN SVM

Assignments (Cont..) 17.Prototype speech recognizer 18.Speaker recognition 19.Language Identification 20.Text-to-Speech synthesis 21.Speech Enhancement (Noise subtraction)

Text books 1.L. R. Rabiner and R. W. Schafer, “Digital processing of speech signals”, Pearson Education, LPE, New Delhi, L. R. Rabiner and B. H. Juang, “Fundamentals of speech recognition”, Pearson Education, LPE, New Delhi, D O’Shaughnessy, “Speech communication: Human and Machine”, Second Edition, IEEE Press, NY, USA, J. R. Deller, Jr., J.H.L. Hansen and J.G. Praokis, “Discrete-time procesing of speech signals” IEEE Press, NY, USA, T. F. Quateri, “Discrete-time speech signal processing: Principles and practice”, Pearson Education, LPE, New Delhi, B. Gold and N. Morgan, Speech and Audio Signal Processing, Wiley Student Edition, Singapore, J. Benesty, M. M. Sondhi and Y. Huang, “Springer Handbook on Speech Processing”, Springer publishers, X. Huang, A. Acero and H. W. Hon, “Spoken Language Processing”, Printice-Hall, Inc., 2001

References 1.IEEE Trans. Audio, Speech and Language Processing 2.Speech Communication (Elsivier) 3.Computer Speech and Language (Elsivier) 4.Journal of acoustical society of America (JASA) 5.IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP) 6.Int. Conf Speech Processing (INTERSPEECH)

Course Evaluation Details Quiz-I: 5% End of January Mid-Sem: 25% End of February Quiz-II: 5% End of March End-Sem: 50% End of April Assignment:15%