# Waveform SpeechCoding Algorithms: An Overview

## Presentation on theme: "Waveform SpeechCoding Algorithms: An Overview"— Presentation transcript:

Waveform SpeechCoding Algorithms: An Overview
June 20th , 2012 Adel Zaalouk

Outline Introduction Concepts Standards & Applications
Quantization PCM DPCM ADPCM Standards & Applications G711 G726 Performance Comparison & Examples Summary & Conclusion Technical Presentation  Page 2

Introduction Motivation What is Speech Coding ?
It is the procedure of representing a digitized speech signal as efficiently as possible, while maintaining a reasonable level of speech quality. Why would we want to do that ? To Answer this, let’s have a look at the Structure of the Coding System Our Guy Technical Presentation  Page 3

Introduction Motivation Filtering & Sampling (1)
Sampling is the process of transforming Continious time signals into discrete time signal Technical Presentation  Page 4

Introduction Motivation Filtering & Sampling (2)
Sampling is the process of transforming Continious time signals into discrete time signal Technical Presentation  Page 5

Introduction Motivation Filtering & Sampling (3)
Sampling is the process of transforming Continious time signals into discrete time signal Technical Presentation  Page 6

Introduction Motivation Filtering & Sampling (4)
Most of the speech contents lies in between 300 – 3400 Hz According to Nyquist theorem Fs >= 2 fm (to avoid aliasing) A value of 8kHz is selected (8 >= 2*3.4). For good quality16 bits are used to represent each sample. Bit-rate = 8kHz *16 bits = 128 kbps Input Rate The Input rate could even be more, for example in Skype: 16 kHz sampling frequency is used in skype and so resulting to an input rate of 192 kBit/s. But, this is a waste of bandwidth that could rather be used by other services and applications. Source Coding (Speech Coding in this Context) [1] Technical Presentation  Page 7

Introduction Motivation Desirable Properties of a Speech Coder
Low Bit-Rate: By using a lower bit-rate, a smaller bandwidth for transmission is needed , leaving room for other services and applications . High Speech Quality: Speech quality is the rival of “low bit-rate”. It is important for the decoded speech quality to be acceptable for the target application. Low Coding Delays: The process of speech coding introduce extra delay, this might affect application that have real time requirements. [1] Technical Presentation  Page 8

Introduction Speech Coding Categories
What are the different Categories of speech coding ? Speech coding is divided into three different categories: Waveform Codecs (PCM, DM, APCM, DPCM, ADPCM) Vocoders (LPC, Homo-morphic, …etc ) Hybrid codecs (CELP, SELP, RELP, APC, SBC, … etc) [2] Technical Presentation  Page 9

Concepts Quantization What Is Quantization ?
Quantization is the process of transforming the sample amplitude of a message into a discrete amplitude from a finite set of possible amplitudes. - L Quantization Levels Vpp = Vp – (-Vp ) 2 Vp volts Step Size is called the quantization interval, is denoted as q volts [3] Each sampled value is approximated with a quantized pulse, the approximation will result in an error no larger than q/2 in the positive direction or –q/2 in the negative direction. Technical Presentation  Page 10

Concepts Quantization Understanding Quantization
To understand quantization a bit more let’s have a look at the following Example: - L Quantization Levels Vpp = Vp – (-Vp ) 2 Vp volts Step Size is called the quantization interval, is denoted as q volts Technical Presentation  Page 11

Concepts Quantization Classification Of Quantization Process
The Quantization process is classified as follows: Uniform Quantization: The representation levels are equally spaced (Uniformly spaced) Midtread type Midrise type Non-Uniform Quantization: The representation levels have variable spacing from one another . In the Midrise, the vlaues of Bi are multiples of delta. In the Mid-tread the values of Q are multiples of delta. There are main differences between the mid-rise and the midtread is Mid-rise: It can’t represent a zero output level. During a very-low or zero input signal interval (silent regions in speech), the output of the coder must be + or – DeltaMin where DeltaMin is the minimum step size in the coder - Mid-tread: A Mid-Tread quantizer has an odd number of levels, this means that it won’t use the 2^B possible levels that are available by quantizer efficiently [4] But why do we need such classification ?! Technical Presentation  Page 12

Concepts Quantization Human Speech – Excursion & Recap (1)
Speech can broken into two different categories: Voiced (zzzzz) Un-Voiced (sssss) Naturally occurring speech signals are composed of a combination of the above categories, take the word “Goat” for example: [4] - L Quantization Levels Vpp = Vp – (-Vp ) 2 Vp volts Step Size is called the quantization interval, is denoted as q volts Goat contains two voiced signals followed by a partial closure of the vocal tract and then an Un-voiced signal. Those occurs at , , and , respectively. Technical Presentation  Page 13

Concepts Quantization - why do we need such classification ?! (1)
Human Speech – Excursion & Recap (2) It should be noted that: The peak-to-peak amplitude of voiced signals is approximately ten times that of un-voiced signal. Un-voiced signals contain more information, and thus higher entropy than voiced signals. The telephone system must provide higher resolution for lower amplitude signals Statistics of Speech Signals : Probability of occurrence 50 percent of the time the voltage characterizing detected speech energy is less than ¼ of the rms value. Large amplitude values are relatively rare; only 15 percent of the time does the voltage exceed the rms value. Amplitude of speech signals [3] [6] Technical Presentation  Page 14

Concepts Quantization - why do we need such classification ?! - (2) 1
Quantization Noise The Quantization process is lossy (errorneous). An error defined as the difference between the input signal M and the output signal V. This error E is called the Quantization Noise. Consider the simple example: M = (3.117, 4.56, 2.31, 7.82, 1) V = (3,3,2,7,2) E = M – V = (0.117 ,1.561, 0.31, 0.89, 1) How do we calculate the noise power ? Consider an input m of continuous amplitude of the range (-M_max, M_max) Assume a uniform Quantizer, how do we get the Quantization Noise Power 1 Technical Presentation  Page 15

Concepts Quantization - why do we need such classification ?! - (3)
Comparison – Uniform Vs. Non-Uniform Usage Speech signals doesn’t require high quantization resolution for high amplitudes (50% Vs. 15%). wasteful to use uniform quantizer ? The goal is decrease the SQNR, more levels for low amplitudes, less levels for high ones. Maybe use a Non-uniform quantizer ? [3] A good Idea is to use a non-uniform quantizer . A non-uniform quantizer can provide fine quantization levels for weak signals ( 50% ) and coarse levels for strong signals (15%) . The goal is decrease the SQNR . And the SQNR is proportional to the number of levels, specially at the weak signal part. Technical Presentation  Page 16

Concepts Quantization More About Non-Uniform Quantizers (Companding)
Uniform quantizer = use more levels when you need it. The human ear follows a logarithmic process in which high amplitude sound doesn’t require the same resolution as low amplitude sounds. One way to achieve non-uniform quantization is to use what is called as “Companding” Companding = “Compression + Expanding” Uniform Quantization Compressor Function Expander Function (-1) Example from the lecture of Prof. S.N.Merchant. Tasks: Put the Example for Comapding Explain Mu-Law and A-Law Understanding from where does the 13 kBit Come From. Explain It should be noted that, A-Law and M-Law are used to compress the 13 or 14 bit signed linear PCM samples to logarithmic 8 bit samples Technical Presentation  Page 17

Concepts Quantization What is the purpose of a Compander ?
The purpose of a compander is to equalize the histogram of speech signals so that the reconstruction levels tend to be equally used. [6] [6] - This is the reference There are two famous companding techniques that Follow the Encoding law A-Law Companding µ-Law Companding 2 Technical Presentation  Page 18

Concepts Quantization A-Law Encoding µ-Law Encoding [3]
- This is the reference Theoritical [3] Technical Presentation  Page 19

Concepts Quantization Companding Approximation
Logarithmic functions are slow to compute, why not approximate ? 3 bits, 8 segments ( chords ) to approximate P is the sign bit of the output S’s are the segment code Q’s are the quantization codes - This is the reference [3] Technical Presentation  Page 20

Concepts Quantization Companding Approximation – Algorithm Encoding
Add a bias of 33 to the absolute value of the input sample Determine the bit position of the most significant among bits 5 to 12 of the input Subtract 5 from that position, and this is the Segment code Finally, the 4 bit quantization code is set to 4 bits after the bit position of the most significant among bits 5 to 12 Decoding Multiply the quantization code by 2 and add 33 the bias to the result Multiply to the result by 2 raised to the power of the segment code Decrement the result by the bias Use P – bit to determine the sign of the result Example ?! Write those encoding rules on the projector [3] Technical Presentation  Page 21

Concepts Quantization 1 P S2 S1 S3 Q3 Q4 Q5 Q6
µ-Law Encoding - Example Example Input - 656 P S S S Q Q Q Q6 1 Sample is negative so bit P becomes 1 Add 33 to the absolute value to bias high input values (due to wrapping) The result after adding is 689 = The most-significant 1 bit in position 5 to 12 is at position 9 Subtracting 5 from the position values yields 4  The segment code Finally the 4 bits after the last position are inserted as the quantization code - This is the reference Technical Presentation  Page 22

Concepts Quantization 1 P S2 S1 S3 Q3 Q4 Q5 Q6
µ-Law Decoding - Example Example Input - 656 P S S S Q Q Q Q6 1 The quantization code is 101 = 5, so 5*2 +33 =43 The segment code is 100 = 4 , so 43* 2^4 = 688. Decrement the Bias =655 But P is 1 so the final result is -655 Quantization Noise is 1 (Very small) - This is the reference Technical Presentation  Page 23

Concepts Quantization µ-Law Encoding
Approximately linear for smaller values & Logarithmic for high input values The practically used values for µ is 255 Used for speech signals Used for PCM telephone systems in US, Canada and Japan A-Law Encoding Linear segments for low level inputs & a logarithmic segment for high level inputs The practically used values for A is 100 Used for PCM telephone system in Europe - This is the reference Technical Presentation  Page 24

Concepts Pulse Code Modulation (PCM) PCM Description
Sampling results in PAM PCM uniformly quantizes PAM The result of PCM are PCM words Each PCM word is l= Log2 (L) bits [3] - This is the reference Technical Presentation  Page 25

Concepts Differential Pulse Code Modulation (DPCM) DPCM Description
Signals that are sampled at a high rate have high correlation. The difference between those samples will not be large Instead of quantizing each sample, why not quantize the difference ? This will result in a quantizer with much less number of bits [7] [7] High rate is much greater than the nyquist rate This is a simple form where (First Order) More than one signal can be used in the prediction (N-Order) Problems with this approach ? Technical Presentation  Page 26

Concepts Differential Pulse Code Modulation (DPCM) DPCM Example
[7] What is A Predictor ?? It is clear here from the table that the error adds up to produce an output signal which is completely different from the original one Technical Presentation  Page 27

Concepts Differential Pulse Code Modulation (DPCM) DPCM Prediction
Previously, input to predictor in the encoder was different than the one in the decoder. The difference between the predictor led to reconstruction error e(n) = x[n] – x’[n]. To solve this problem completely the same predictor that was used in the decoder will also be used in the decoder Therefore the reconstruction error at the decoder output will be the same as the quantization error at the encoder. There will be no quantization accumulation. Channel At the decoder we don’t have the x(k) so we will use the x’(k) [The past reconstructed samples] Technical Presentation  Page 28

Standards, Examples & Applications
G711 G711 Description A Wave form codec that was Released in 1972 Formal name is Pulse Code Modulation (PCM) since it uses PCM in it’s encoding G711 achieves 64 kbps bit rate (8 kHz sampling frequency x 8 bits per sample) G711 defines two main compression algorithms A-Law (Used in North America & Japan) µ-Law (Used in Europe and the rest of the world) A and µ laws takes as an input 14-bit and 13-bit signed linear PCM samples and Compress them to 8-bit samples Applications Public Switching Telephone Network (PSTN) WiFi phones VoWLAN Wideband IP Telephony Audio & Video Conferencing H.320 & H.323 specifications Technical Presentation  Page 30

Standards, Examples & Applications
G726 G726 Description G726 makes a conversion of a 64 kbps A-law or µ-law PCM channel to and from a 40, 32, 24 or 16 kbps channel. The conversion is applied to raw PCM using the ADPCM Encoding Technique Different rates are achieved by adapting the number of quantization levels levels (2 bits and 16 kbps) levels (3 bits and 24 kbps) levels (4 bits and 32 kbps) levels (5 bits and 64 kbps) Includes G721 and G723 [12] Technical Presentation  Page 31

Performance Comparison
[1] Technical Presentation  Page 32

Summary & Conclusion Summary & Conclusion Summary Conclusion
We talked about quantization concepts in all it’s flavors We discussed about the category of waveform coding (PCM,DPCM and ADPCM) We presented the ITU Standards (G711 and G726) and mentioned some examples and applications Finally we did a comparison the most prominent speech codec's out there. Conclusion Speech coding Is an important concept that is required to efficiently use the existing bandwidth There exist many important metrics to keep in mind when doing speech coding. It is I important for a good speech coder to balance those metrics. The Most important ones are Data Rate Speech Quality Delay Waveform codec's, achieves the best speech quality as well as low delays. Vocoders achieves low data rate but at the cost of delays and speech quality Hybrid coders achieves acceptable speech quality and acceptable delay and data rate. Technical Presentation  Page 33

References Wai C. Chu Speech Coding Algorithms: Foundation & Evolution of Standardized Coders Speech Coding: Sklar: Digital Communication Fundamentals And Applications. A-Law and mu-Law Companding Implementations Using the TMS320C54x Michael Langer: Data Compression – Introduction to lossy compression Signal Quantization and Compression Overview    Wajih Abu-Al-Saud: Ch. VI Sampling & Pulse Code Mod. Lecture 25 Yuli You: Audio Coding: Theory And Applications Tarmo Anttalainen: Introduction to telecommunication Networks Engineering Wikipedia G711: David Salomon: Data Communication the Complete Reference ITU CCIT Recommendation G.726 ADPCM Technical Presentation  Page 34

Questions & Discussion
Thank you!! Technical Presentation  Page 35

Similar presentations