Speech coding.

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

Time-Frequency Analysis Analyzing sounds as a sequence of frames
Sampling and Pulse Code Modulation
Digital Coding of Analog Signal Prepared By: Amit Degada Teaching Assistant Electronics Engineering Department, Sardar Vallabhbhai National Institute of.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Quantization Prof. Siripong Potisuk.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Data Compression Michael J. Watts
School of Computing Science Simon Fraser University
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
SWE 423: Multimedia Systems
Spatial and Temporal Data Mining
SWE 423: Multimedia Systems Chapter 7: Data Compression (2)
Computer Science 335 Data Compression.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Losslessy Compression of Multimedia Data Hao Jiang Computer Science Department Sept. 25, 2007.
Pulse Modulation CHAPTER 4 Part 3
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
Department of Computer Engineering University of California at Santa Cruz Data Compression (2) Hai Tao.
Noise, Information Theory, and Entropy
Pulse Modulation 1. Introduction In Continuous Modulation C.M. a parameter in the sinusoidal signal is proportional to m(t) In Pulse Modulation P.M. a.
Speech coding. What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication,
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
Fundamentals of Digital Communication
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
Chapter 6 Basics of Digital Audio
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 7 – Basics of Compression (Part 2) Klara Nahrstedt Spring 2011.
ECE 4710: Lecture #9 1 PCM Noise  Decoded PCM signal at Rx output is analog signal corrupted by “noise”  Many sources of noise:  Quantizing noise »Four.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
1 Quantization Error Analysis Author: Anil Pothireddy 12/10/ /10/2002.
PCM & DPCM & DM.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
JPEG.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Image Compression Review of Basics Huffman coding run length coding Quantization.
Image Processing Architecture, © Oleh TretiakPage 1Lecture 4 ECE-C490 Winter 2004 Image Processing Architecture Lecture 4, 1/20/2004 Principles.
Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Entropy vs. Average Code-length Important application of Shannon’s entropy measure is in finding efficient (~ short average length) code words The measure.
Data Compression Michael J. Watts
Chapter 8 Lossy Compression Algorithms
JPEG Compression What is JPEG? Motivation
IMAGE PROCESSING IMAGE COMPRESSION
EE465: Introduction to Digital Image Processing
CSI-447: Multimedia Systems
The Johns Hopkins University
Data Compression.
Principios de Comunicaciones EL4005
Analog to digital conversion
Digital Communications Chapter 13. Source Coding
UNIT II.
Quantization and Encoding
Data Compression.
Data Compression CS 147 Minh Nguyen.
Context-based Data Compression
Subject Name: Digital Communication Subject Code:10EC61
4.2 Digital Transmission Pulse Modulation (Part 2.1)
PCM & DPCM & DM.
Presentation transcript:

Speech coding

What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication, voice over IP Code efficiency (high quality, fewer bits) is a must

Components of a speech coding system The sampling process is not depicted in this diagram (it should be on the left side) “Binary encoding” component is the “Discrete source encoder” from the Digital Communication Principles lecture.

Example of coding techniques ZIP: no transformation nor quantization, apply VLC (LZW) to the stream of letters (symbols) in a file directly, lossless coding PCM for speech: no transformation, quantize the speech samples directly, apply fixed length binary coding ADPCM for speech: apply prediction to original samples, the predictor is adapted from one speech frame to the next, quantize the prediction error, error symbols coded using fixed length binary coding JPEG for image: apply discrete cosine transform to blocks of image pixels, quantize the transformed coefficients, code the quantized coefficients using variable length coding (runlength + Huffman coding)

Binary encoding

Binary encoding Binary encoding: to represent a finite set of symbols using binary codewords. Fixed length coding: N levels represented by (int) log2(N) bits. Variable length coding (VLC): more frequently appearing symbols represented by shorter codewords (Huffman, arithmetic, LZW=zip). The minimum number of bits required to represent a source is bounded by its entropy

Entropy bound on bitrate (Shannon theory) A source with finite number of symbols Symbol sn has probability (frequency) P(sn) = pn If symbol sn is given a codeword with ln bits, the average bitrate (bits/symbol) would be: Average bitrate is bounded by the entropy of the source (H): For this reason, variable length coding is also known as entropy coding

Huffman encoding example

Huffman encoding example (2) Huffman encode the sequence of symbols {3,2,2,0,1,1,2,3,2,2} using the codes from previous slide Code table: Coded sequence: {01,1,1,000,001,001,1,01,1,1} Average bit rate: 18 bits/10=1.8 bits/symbol Fixed length coding rate: 2 bits/symbol Saving is more obvious for a longer sequence of symbol Decoding: table lookup Symbol Codeword 000 1 001 2 3 01

Huffman encoding algorithm Step 1: arrange the symbol probabilities in a decreasing order and consider them as leaf nodes of a tree Step 2: while there are more than one node: Find the two nodes with the smallest probability and assign the one with the lowest probability a “0”, and the other one a “1” (or the other way, but be consistent) Merge the two nodes to form a new node whose probability is the sum of the two merged nodes. Go back to Step 1 Step 3: For each symbol, determine its codeword by tracing the assigned bits from the corresponding leaf node to the top of the tree. The bit at the leaf node is the last bit of the codeword

More on Huffman encoding Huffman coding achieves the upper entropy bound One can code one symbol at a time (scalar coding) or a group of symbols at a time (vector coding) If the probability distribution is known and accurate, Huffman coding is very good (off from the entropy by 1 bit at most).

Transformation

Waveform-based coders Non-predictive coding (uniform or non-uniform): samples are encoded independently; PCM Predictive coding: samples are encoded as difference from other samples; LCP or Differential PCM (DPCM)

PCM (Pulse Code Modulation) In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number of bits used to represent each sample The bitrate of the encoded signal will be : B*F bps where F is the sample frequency The quantized waveform is modeled as: where q(n) is the quantization noise

Predictive coding (LPC or DPCM) Observation: Adjacent samples are often similar Predictive coding: Predict the current sample from previous samples, quantize and code the prediction error, instead of the original sample. If the prediction is accurate most of the time, the prediction error is concentrated near zeros and can be coded with fewer bits than the original signal Usually a linear predictor is used (linear predictive coding):

Predictive encoder diagram

Predictive decoder diagram

Quantization

Uniform quantisation Each sample of speech x(t) is represented by a binary number x[n]. Each binary number represents a quantisation level. With uniform quantisation there is constant voltage difference  between levels. 111 x(t) Volts 7 x[n] 110 6  101 5   100 4  011 3  010 2  001  000 n 1 2 3 4 5 6 7 8 T

Quantisation error If samples are rounded, uniform quantisation produces unless overflow occurs when magnitude of e[n] may >> /2. Overflow is best avoided. e[n] is quantisation error.

Noise due to uniform quantisation error Samples e[n] are ‘random’ within /2. If x[n] is converted back to analogue form, these samples are heard as a ‘white noise’ sound added to x(t). Noise is an unwanted signal. White noise is spread evenly across all frequencies. Sounds like a waterfall or the sea. Not a car or house alarm, or a car revving its engine. Samples e[n] have uniform probability between /2. It may be shown that the mean square value of e[n] is: Becomes the power of analogue quantisation noise. Power in Watts if applied to 1 Ohm speaker. Loudness!!

Signal-to-quantisation noise ratio (SQNR) Measure how seriously signal is degraded by quantisation noise. With uniform quantisation, quantisation-noise power is 2/12 Independent of signal power. Therefore, SQNR will depend on signal power. If we amplify signal as much as possible without overflow, for sinusoidal waveforms with n-bit uniform quantiser: SQNR  6n + 1.8 dB. Approximately true for speech also.

Variation of input levels For telephone users with loud voices & quiet voices, quantisation-noise will have same power, 2/12.  may be too large for quiet voices, OK for slightly louder ones, & too small (risking overflow) for much louder voices. 000 111 001 volts OK  too big for quiet voice  too small for loud voice 

Companding for ‘narrow-band’ speech ‘Narrow-band’ speech is what we hear over telephones. Normally band-limited from 300 Hz to about 3500 Hz. May be sampled at 8 kHz. 8-bits per sample not sufficient for good ‘narrow-band’ speech encoding with uniform quantisation. Problem lies with setting a suitable quantisation step-size . One solution is to use instantaneous companding. Step-size adjusted according to amplitude of sample. For larger amplitudes, larger step-sizes used as illustrated next. ‘Instantaneous’ because step-size changes from sample to sample.

Non-uniform quantisation used for companding x(t) t 0001 -001 0111 -111 0110 -110 0101 -101 x[n] 0100

Implementation of companding Digitise x(t) accurately with uniform quantisation to give x[n]. Apply compressor formula to x[n] to give y[n]. Uniformly quantise y[n] using fewer bits Store or transmit the compressed result. Passing it thro’ expander reverses effect of compressor. As y[n] was quantised, we don’t get x[n] exactly. Uniform quantise (many bits) Compressor Expander x(t) x[n] Transmit or store y[n] x’[n] Uniform quantise (fewer bits)

Effect of compressor Increase smaller amplitudes of x[n] & reduce larger ones. When uniform quantiser is applied, fixed  appears: smaller in proportion to smaller amplitudes of x[n], larger in proportion to larger amplitudes. Effect is non-uniform quantisation as illustrated before. Famous compressor formulas: A-law & Mu-law (G711) These require 8-bits per sample. Expander is often implemented by a ‘look-up’ table. You have only 4 - bits per sample – makes the task hard! There is no unique solution

Speech coding characteristics Speech coders are lossy coders, i.e. the decoded signal is different from the original The goal in speech coding is to minimize the distortion at a given bit rate, or minimize the bit rate to reach a given distortion Metrics in speech coding: Objective measure of distortion is SNR (Signal to noise ratio); SNR does not correlate well with perceived speech quality Subjective measure - MOS (mean opinion score): 5: excellent 4: good 3: fair 2: poor 1: bad