CS 294-9 :: Fall 2003 Audio Coding Ketan Mayer-Patel.

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Analogue to Digital Conversion (PCM and DM)
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
CSc 461/561 CSc 461/561 Multimedia Systems Part A: 1. Audio.
© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.
Fundamental of Wireless Communications ELCT 332Fall C H A P T E R 6 SAMPLING AND ANALOG-TO-DIGITAL CONVERSION.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Chapter 4 Digital Transmission
Waveform SpeechCoding Algorithms: An Overview
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Digital Communication Techniques
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Fundamentals of Digital Communication
Chapter Seven: Digital Communication
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Pulse Code Modulation (PCM)
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
10/6/2015 3:12 AM1 Data Encoding ─ Analog Data, Digital Signals (5.3) CSE 3213 Fall 2011.
Speech and Audio Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Sound Sound is a continuous wave that travels through the air
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
CHAPTER 3 DELTA MODULATION
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
4.2 Digital Transmission Pulse Modulation Pulse Code Modulation
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
COMMUNICATION SYSTEM EEEB453 Chapter 5 (Part III) DIGITAL TRANSMISSION Intan Shafinaz Mustafa Dept of Electrical Engineering Universiti Tenaga Nasional.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Audio Coding Lecture 7. Content  Digital Audio Basic  Speech Compression  Music Compression.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Digital Communications Chapter 13. Source Coding
Vocoders.
Topics discussed in this section:
UNIT II.
4.1 Chapter 4 Digital Transmission Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
CS 4594 Data Communications
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Vocoders.
PCM & DPCM & DM.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

CS :: Fall 2003 Audio Coding Ketan Mayer-Patel

CS :: Fall 2003 Overview of Today PCM –Linear –  -LaW DPCM ADPCM MPEG-1 Vocoding Sampling Techniques Generic Coding Techniques Psychoacoutic Coding Speech Specific Techniques

CS :: Fall 2003 Audio Signals Analog audio is basically voltage as a continuous function of time. Unlike video which is 3D, audio is a 1D signal. –Can capture without having to discretize the higher dimensions. Audio sampling basically boils down to quantizing signal level to a set of values. Digital audio parameters: –bits per sample –sampling rate –number of channels.

CS :: Fall 2003 Sampling Pulse Amplitude Modulation (PAM) –Each sample’s amplitude is represented by 1 analog value Sampling theory (Nyquist) –If input signal has maximum frequency (bandwidth) f, sampling frequency must be at least 2f –With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed

CS :: Fall 2003 PCM Pulse Code Modulation (PCM) –Each sample’s amplitude represented by an integer code-word –Each bit of resolution adds 6 dB of dynamic range –Number of bits required depends on the amount of noise that is tolerated Quantization error (“noise”) n = SNR –

CS :: Fall 2003 Linear PCM Uses evenly spaced quantization levels. Typically 16-bits per sample. Provides a large dynamic range. Difficult for humans to perceive quantization noise. Compact Disks –16-bit linear sampling –44.1 KHz sampling rate –2 channels

CS :: Fall 2003 Non-linear Sampling If we try to use 8 bits per sample, dynamic range is reduced significantly and quantization noise can be heard. In particular, we end up with not enough levels for the lower amplitudes. Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes. Sort of like a log scale.

CS :: Fall 2003 Non-linear Sampling Illustrated Input Output

CS :: Fall 2003  -law and A-law Non-linear sampling called “companding” 8-bits companded provides dynamic range equivalent to 12-bits. U-law and A-law are companding standards defined in G.711 Difference is in exact shape of piece-wise linear companding function.

CS :: Fall 2003 f(x) = 127 x sign(x) x ln(1 +  |x|) ln(1 +  ) (x normalized to [-1, 1])  -Law companding Provides 14-bit quality (dynamic range) with an 8-bit encoding Used in North American & Japanese ISDN voice service Simple to compute encoding

CS :: Fall 2003 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Input Amplitude Step Size Segment Quanti- zation Code Value  -Law Encoding

CS :: Fall 2003 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Multiplier  -Law Endoding Decode Amplitude  -Law Decoding

CS :: Fall Difference Encoding Differential-PCM (DPCM) –Exploit temporal redundancy in samples –Difference between 2 x-bit samples can be represented with significantly fewer than x-bits –Transmit the difference (rather than the sample)

CS :: Fall “Slope Overload” Slope Overload Problem Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits! –Error introduced leads to severe distortion in the higher frequencies

CS :: Fall 2003 Adaptive DPCM (ADPCM) Use a larger step-size to encode differences between high-frequency samples & a smaller step- size for differences between low-frequency samples Use previous sample values to estimate changes in the signal in the near future

CS :: Fall Predictor + – + y-bit PCM sample x-bit ADPCM “difference” Difference Quantizer Step-Size Adjuster Dequantizer + Predicted PCM Sample n+1 ADPCM To ensure differences are always small... –Adaptively change the step-size (quanta) –(Adaptively) attempt to predict next sample value

CS :: Fall Register + – + 16-bit PCM sample 4-bit ADPCM difference Difference Quantizer Step-Size Adjuster Dequantizer + PCM Sample n–1 IMA’s proposed ADPCM Predictor is not adaptive and simply uses the last sample value Quantization step-size increases logarithmically with signal frequency

CS :: Fall Register + – + 16-bit PCM sample PCM sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Multiples Quantizer Output IMA Difference Quantization

CS :: Fall Index Step Size Index Step Size Index Step Size Index Step Size Index Step Size IMA Step-size Table

CS :: Fall Register + – + 16-bit PCM Sample PCM Sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index Quantizer Output New Step-Size Adaptive Step-size Selection

CS :: Fall 2003 Step-Size Table Index Adjustment Quantizer Output Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index New Step-Size Difference Quantizer Difference Quantizer difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Adjustment X 0.91 X 1.21 X 1.46 X 1.77 X Adaptive Step-size Selection

CS :: Fall 2003 X  Step Q Adj I M  Decode Input Difference Quantizer output Step Size Index Adjustment Step-Size table index Predicted value Step-size multiplier Reconstituted difference + Register + – + Difference Quantizer Difference Quantizer Step-Size Adjuster Step-Size Adjuster Dequantizer + XnXn X n–1 + IMA ADPCM Example

CS :: Fall Step-Size Adjuster + PCM sample n–1 difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Table Index Adjustment Quantizer Output Dequantizer Register Networking Considerations The IMA codec is reasonably robust to errors An interval with a low-level signal will correct any step- size error

CS :: Fall 2003 Psychoacoustic Properties Human perception of sound is a function of frequency and signal strength –(MPEG exploits this relationship.) Sound Level (dB) Frequency (kHz) Inaudible Audible

CS :: Fall Sound Level (dB) Frequency (kHz) Inaudible Audible Masking tone Masked tone Auditory Masking The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies –Humans cannot distinguish between tones within 100 Hz at low frequencies and 4 kHz at high frequencies

CS :: Fall 2003 MPEG Encoder Block Diagram MappingQuantizerCoding Frame Packing Psycho- acoutstic Model PCM Audio Samples (32, 44.1, 48 kHz) Encoded Bitstream Ancillary Data

CS :: Fall 2003 Subband Filter Transforms signal from time domain to frequency domain. –32 PCM samples yields 32 subband samples. Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq. –Filter actually works on a window of 512 samples that is shifted over 32 samples at a time. Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.

CS :: Fall 2003 Layer samples per frame. Iterative bit allocation process: –For each subband, determine MNR. –Increase number of quantization bits for subband with smallest MNR. –Iterate until all bits used. Fixed allocation of bits among subbands for a particular frame. Up to 448 kb/s

CS :: Fall 2003 Layer samples per frame. Iterative bit allocation. Subband allocation is dynamic. Up to 384 kb/s

CS :: Fall 2003 Layer samples –Up to 320 kb/s Each subband further analyzed using MDCT to create 576 frequency lines. –4 different windowing schemes depending on whether samples contain “attack” of new frequencies. Lots of bit allocation options for quantizing frequency coefficients. Quantized coefficients Huffman coded.

CS :: Fall 2003 Vo-coding Concept: Develop a mathematical model of the vocal cords & throat –Derive/compute model parameters for a short interval and transmit to the decoder –Use the parameters to synthesize speech at the decoder So what is a good model? –A “buzzer” in a “tube”! –The buzzer is characterized by its intensity & pitch –The tube is characterized by its formants

CS :: Fall Amplitude Frequency (kHz) Vocoding - Basic Concepts Formant — frequency maxima & minima in the spectrum of the speech signal Vocoders group and code portions of the signal by amplitude

CS :: Fall 2003 “yadda yadda yadda” y(n) = a k y(n – k) + G x x(n)  k=1 p Linear Predictive Coding (LPC) –A sample is represented as a linear combination of p previous samples “Buzzer” and “Tube” Model Vocoding principles: –voice = formants + buzz pitch & intensity –voice – estimated formants = “residue”

CS :: Fall 2003 LPC Decoder artificially generates speech via formant synthesis –A mathematical simulation of the vocal tract as a series of bandpass filters –Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation Standards: –Regular Pulse Excited Linear Predictive Coder (RPE-LPC) Digital cellular standard GSM 6.1 (13 kbps) –Code Excited Linear Predictive Coder (CELP) US Federal Standard 1016 (4.8 kbps) –Linear Predictive Coder (LPC) US Federal Standard 1015 (2.4 kbps)

CS :: Fall 2003 Networking Concerns Audio bandwidth is actually quite small. But human sensitivity to loss and noise is quite high. Netwoking concerns: –Loss concealment –Jitter control Especially for telephony applications.