Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

LPC10 2.4kbps federal standard in speech coding
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Department of Computer Engineering University of California at Santa Cruz Data Compression (2) Hai Tao.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
A PRESENTATION BY SHAMALEE DESHPANDE
Waveform SpeechCoding Algorithms: An Overview
Speech Signal Processing
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Secure Steganography in Audio using Inactive Frames of VoIP Streams
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
(Extremely) Simplified Model of Speech Production
Experimental Results Abstract Fingerspelling is widely used for education and communication among signers. We propose a new static fingerspelling recognition.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Performance Comparison of Speaker and Emotion Recognition
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
DVSI HX-SD™ Selectable Mode Vocoder
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
UNIT V. Linear Predictive coding With the advent of inexpensive digital signal processing circuits, the source simply analyzing the audio waveform to.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Chapter 13 Basic Audio Compression Techniques 13.1 ADPCM in Speech Coding 13.2 G.726 ADPCM 13.3 Vocoders 13.4 Further Exploration.
Digital Communications Chapter 13. Source Coding
Vocoders.
Chapter 13 Basic Audio Compression Techniques
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
The Vocoder and its related technology
Statistical Models for Automatic Speech Recognition
Vocoders.
Digital Systems: Hardware Organization and Design
Linear Prediction.
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speech Processing Final Project
Presentation transcript:

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007

Outline  Introduction  Topics in speech processing –Speech coding –Speech recognition –Speech synthesis –Speaker verification/recognition  Conclusion

Introduction  Speech is our basic communication tool.  We have been hoping to be able to communicate with machines using speech. C3PO and R2D2

Speech Production Model Anatomy Structure Mechanical Model

Characteristics of Digital Speech Waveform Spectrogram Speech

Voiced and Unvoiced Speech Silenceunvoiced voiced

Short-time Parameters Short time power Waveform Envelop

Zero crossing rate Pitch period

Speech Coding  Similar to images, we can also compress speech to make it smaller and easier to store and transmit.  General compression methods such as DPCM can also be used.  More compression can be achieved by taking advantage of the speech production model.  There are two classes of speech coders: –Waveform coder –Vocoder

LPC Speech Coder Speech buffer Speech Analysis Pitch Voiced/ unvoiced Vocal track Parameter Energy Parameter Quantizer Code generation speech Code stream Frame n Frame n+1

LPC and Vocal Track x(n) =  p=1 k a p x(n-p) + e(n)  Mathematically, speech can be modeled as the following generation model:  {a 1, a 2, …, a k } are called Linear Prediction Coefficients (LPC), which can be used to model the shape of vocal track.  e(n) is the excitation to generate the speech.

Decoding and Speech Synthesis Impulse Train Generator Glottal Pulse Generator Random Noise Generator Vocal Track Model Radiation Model Pitch Period Gain speech U/V

An Example for Synthesizing Speech Blending region Glottal Pulse Go through vocal track filter with gain control Go through radiation filter

LPC10 (FS1015)  2.4kbps LPC10 was DOD speech coding standard for voice communication at 2.4kbps.  LPC10 works on speech of 8Hz, using a 22.5ms frame and 10 LPC coefficients. Original Speech LPC Decoded Speech

Mixed Excitation LP  For real speech, the excitation is usually not pure pulse or noise but a mixture.  The new 2.4kbps standard (MELP) addresses this problem. Bandpass filter Bandpass filter + w 1-w pulses noise Vocal Track Model Radiation Model Gain speech Original Speech MELP Decoded Speech

Hybrid Speech Codecs  For higher bit rate speech coders, hybrid speech codecs have more advantage than vocoders.  FS1016: CELP (Code Excitation Linear Predictive)  G.723.1: A dual bit rate codec (5.3kbps and 6.3kbps) for multimedia communication through Internet.  G.729: CELP based codec at 8kbps. “perceptual” comparison Model parameter generation Speech synthesis Analysis by Synthesis speech code Sound at 5.3kbpsSound at 6.3kbps Sound at 8kbps

Speech Recognition  Speech recognition is the foundation of human computer interaction using speech.  Speech recognition in different contexts –Dependent or independent on the speaker. –Discrete words or continuous speech. –Small vocabulary or large vocabulary. –In quiet environment or noisy environment. Parameter analyzer Comparison and decision algorithm Language model Reference patterns speech Words

How does Speech Recognition Work? Words: grey whales Phonemes: g r e y w e y l z Each phoneme has different characteristics (for example, The power distribution).

Speech Recognition g g r ey ey ey ey w ey ey l l z How do we “match” the word when there are time and other variations?

Hidden Markov Model S1S2 S3 P12 {a,b,c,…}

Dynamic Programming in Decoding time states We can find a path that corresponds to max-probable phonemes to generate the observation “feature” (extracted in each speech frame) sequence.

HMM for a Unigram Language Model HMM1 (word1) HMM2 (word2) HMM3 (wordn) p1 p2 p3 s0

Speech Synthesis  Speech synthesis is to generate (arbitrary) speech with desired prosperities (pitch, speed, loudness, articulation mode, etc.)  Speech synthesis has been widely used for text-to- speech systems and different telephone services.  The easiest and most often used speech synthesis method is waveform concatenation. Increase the pitch without changing the speed

Speaker Recognition  Identifying or verifying the identity of a speaker is an application where computer exceeds human being.  Vocal track parameter can be used as a feature for speaker recognition. LPC covariance feature Speaker oneSpeaker two

Applications Speech recognition Call routing Directory Assistance Operator Services Document input Speaker recognition Personalized service Fraud Control Text-to-Speech synthesis Speech Interface Document Correction Voice Commands Speech Coding Wireless Telephone Voice over Internet