COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Analogue to Digital Conversion (PCM and DM)
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Lecture 7: Spring 2009 Lossless Compression Algorithms
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
CSc 461/561 CSc 461/561 Multimedia Systems Part A: 1. Audio.
© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.
Fundamental of Wireless Communications ELCT 332Fall C H A P T E R 6 SAMPLING AND ANALOG-TO-DIGITAL CONVERSION.
Chapter 4 Digital Transmission
Waveform SpeechCoding Algorithms: An Overview
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
Fundamentals of Digital Communication
Chapter Seven: Digital Communication
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Pulse Code Modulation (PCM)
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
10/6/2015 3:12 AM1 Data Encoding ─ Analog Data, Digital Signals (5.3) CSE 3213 Fall 2011.
Speech and Audio Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Sound Sound is a continuous wave that travels through the air
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Week 7 Lecture 1+2 Digital Communications System Architecture + Signals basics.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
CHAPTER 3 DELTA MODULATION
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
4.2 Digital Transmission Pulse Modulation Pulse Code Modulation
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Audio Coding Lecture 7. Content  Digital Audio Basic  Speech Compression  Music Compression.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Analog to digital conversion
Digital Communications Chapter 13. Source Coding
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
CS 4594 Data Communications
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Vocoders.
PCM & DPCM & DM.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel

COMP 249 :: Spring 2005 Slide: 2 Overview of Today PCM –Linear –  -LaW DPCM ADPCM MPEG-1 Vocoding Sampling Techniques Generic Coding Techniques Psychoacoustic Coding Speech Specific Techniques

COMP 249 :: Spring 2005 Slide: 3 Audio Signals Analog audio is basically voltage as a continuous function of time. Unlike video which is 3D, audio is a 1D signal. –Can capture without having to discretize the higher dimensions. Audio sampling basically boils down to quantizing signal level to a set of values. Digital audio parameters: –bits per sample –sampling rate –number of channels.

COMP 249 :: Spring 2005 Slide: 4 Sampling Pulse Amplitude Modulation (PAM) –Each sample’s amplitude is represented by 1 analog value Sampling theory (Nyquist) –If input signal has maximum frequency (bandwidth) f, sampling frequency must be at least 2f –With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed

COMP 249 :: Spring 2005 Slide: 5 PCM Pulse Code Modulation (PCM) Each sample’s amplitude represented by an integer code-word Each bit of resolution adds 6 dB of dynamic range Number of bits required depends on the amount of noise that is tolerated Quantization error (“noise”) n = SNR –

COMP 249 :: Spring 2005 Slide: 6 Linear PCM Uses evenly spaced quantization levels. Typically 16-bits per sample. Provides a large dynamic range. Difficult for humans to perceive quantization noise. Compact Disks –16-bit linear sampling –44.1 KHz sampling rate –2 channels

COMP 249 :: Spring 2005 Slide: 7 Non-linear Sampling If we try to use 8 bits per sample, dynamic range is reduced significantly and quantization noise can be heard. In particular, we end up with not enough levels for the lower amplitudes. Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes. Sort of like a log scale.

COMP 249 :: Spring 2005 Slide: 8 Non-linear Sampling Illustrated Input Output

COMP 249 :: Spring 2005 Slide: 9  -law and A-law Non-linear sampling called “companding” 8-bits companded provides dynamic range equivalent to 12-bits. U-law and A-law are companding standards defined in G.711 Difference is in exact shape of piece-wise linear companding function.

COMP 249 :: Spring 2005 Slide: 10 f(x) = 127 x sign(x) x ln(1 +  |x|) ln(1 +  ) (x normalized to [-1, 1])  -Law companding Provides 14-bit quality (dynamic range) with an 8-bit encoding Used in North American & Japanese ISDN voice service Simple to compute encoding

COMP 249 :: Spring 2005 Slide: 11 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Input Amplitude Step Size Segment Quanti- zation Code Value  -Law Encoding

COMP 249 :: Spring 2005 Slide: 12 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Multiplier  -Law Endoding Decode Amplitude  -Law Decoding

COMP 249 :: Spring 2005 Slide: Difference Encoding Differential-PCM (DPCM) –Exploit temporal redundancy in samples –Difference between 2 x-bit samples can be represented with significantly fewer than x-bits –Transmit the difference (rather than the sample)

COMP 249 :: Spring 2005 Slide: “Slope Overload” Slope Overload Problem Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits! –Error introduced leads to severe distortion in the higher frequencies

COMP 249 :: Spring 2005 Slide: 15 Adaptive DPCM (ADPCM) Use a larger step-size to encode differences between high-frequency samples & a smaller step- size for differences between low-frequency samples Use previous sample values to estimate changes in the signal in the near future

COMP 249 :: Spring 2005 Slide: Predictor + – + y-bit PCM sample x-bit ADPCM “difference” Difference Quantizer Step-Size Adjuster Dequantizer + Predicted PCM Sample n+1 ADPCM To ensure differences are always small... –Adaptively change the step-size (quanta) –(Adaptively) attempt to predict next sample value

COMP 249 :: Spring 2005 Slide: 17 + Register + – 16-bit PCM sample 4-bit ADPCM difference Difference Quantizer Step-Size Adjuster Dequantizer PCM Sample n–1 IMA’s proposed ADPCM Predictor is not adaptive and simply uses the last sample value Quantization step-size increases logarithmically with signal frequency

COMP 249 :: Spring 2005 Slide: Register + – + 16-bit PCM sample PCM sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Multiples Quantizer Output IMA Difference Quantization

COMP 249 :: Spring 2005 Slide: Index Step Size Index Step Size Index Step Size Index Step Size Index Step Size IMA Step-size Table

COMP 249 :: Spring 2005 Slide: 20 + Register + – 16-bit PCM Sample PCM Sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index Quantizer Output New Step-Size Adaptive Step-size Selection

COMP 249 :: Spring 2005 Slide: 21 Step-Size Table Index Adjustment Quantizer Output Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index New Step-Size Difference Quantizer Difference Quantizer difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Adjustment X 0.91 X 1.21 X 1.46 X 1.77 X Adaptive Step-size Selection

COMP 249 :: Spring 2005 Slide: 22 X  Step Q Adj I M  Decode Input Difference Quantizer output Step Size Index Adjustment Step-Size table index Predicted value Step-size multiplier Reconstituted difference + Register + – + Difference Quantizer Difference Quantizer Step-Size Adjuster Step-Size Adjuster Dequantizer + XnXn X n–1 + IMA ADPCM Example

COMP 249 :: Spring 2005 Slide: Step-Size Adjuster + PCM sample n–1 difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Table Index Adjustment Quantizer Output Dequantizer Register Networking Considerations The IMA codec is reasonably robust to errors An interval with a low-level signal will correct any step- size error

COMP 249 :: Spring 2005 Slide: 24 Psychoacoustic Properties Human perception of sound is a function of frequency and signal strength –(MPEG exploits this relationship.) Sound Level (dB) Frequency (kHz) Inaudible Audible

COMP 249 :: Spring 2005 Slide: Sound Level (dB) Frequency (kHz) Inaudible Audible Masking tone Masked tone Auditory Masking The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies –Humans cannot distinguish between tones within 100 Hz at low frequencies and 4 kHz at high frequencies

COMP 249 :: Spring 2005 Slide: 26 MPEG Encoder Block Diagram MappingQuantizerCoding Frame Packing Psycho- acoutstic Model PCM Audio Samples (32, 44.1, 48 kHz) Encoded Bitstream Ancillary Data

COMP 249 :: Spring 2005 Slide: 27 Subband Filter Transforms signal from time domain to frequency domain. –32 PCM samples yields 32 subband samples. Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq. –Filter actually works on a window of 512 samples that is shifted over 32 samples at a time. Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.

COMP 249 :: Spring 2005 Slide: 28 Layer samples per frame. Iterative bit allocation process: –For each subband, determine SMR. –Increase number of quantization bits for subband with largest SMR. –Iterate until all bits used. Up to 448 kb/s 19ms theoretical minimum delay

COMP 249 :: Spring 2005 Slide: 29 Layer samples per frame. Iterative bit allocation. Analysis/synthesis a bit more complicated. –More efficient Up to 384 kb/s 34ms theoretical minimum delay

COMP 249 :: Spring 2005 Slide: 30 Layer samples –Up to 320 kb/s Each subband further analyzed using MDCT to create 576 frequency lines. Lots of bit allocation options for quantizing frequency coefficients. Quantized coefficients Huffman coded. Encoded frame sizes are variable. 59 ms theoretical minimum delay.

COMP 249 :: Spring 2005 Slide: 31 Vo-coding Concept: Develop a mathematical model of the vocal cords & throat –Derive/compute model parameters for a short interval and transmit to the decoder –Use the parameters to synthesize speech at the decoder So what is a good model? –A “buzzer” in a “tube”! –The buzzer is characterized by its intensity & pitch –The tube is characterized by its formants

COMP 249 :: Spring 2005 Slide: Amplitude Frequency (kHz) Vocoding - Basic Concepts Formant –Resonance frequencies of the vocal tract. –Shapes and filters the sound of vocal cords.

COMP 249 :: Spring 2005 Slide: 33 “yadda yadda yadda” y(n) = a k y(n – k) + G x x(n)  k=1 p Linear Predictive Coding (LPC) A sample is represented as a linear combination of p previous samples “Buzzer” and “Tube” Model Vocoding principles: –voice = formants + buzz pitch & intensity –voice – estimated formants = “residue”

COMP 249 :: Spring 2005 Slide: 34 LPC Decoder artificially generates speech via formant synthesis –A mathematical simulation of the vocal tract as a series of bandpass filters –Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation Standards: –Regular Pulse Excited Linear Predictive Coder (RPE-LPC) Digital cellular standard GSM 6.1 (13 kbps) –Code Excited Linear Predictive Coder (CELP) US Federal Standard 1016 (4.8 kbps) –Linear Predictive Coder (LPC) US Federal Standard 1015 (2.4 kbps)

COMP 249 :: Spring 2005 Slide: 35 Networking Concerns Audio bandwidth is actually quite small. But human sensitivity to loss and noise is quite high. Netwoking concerns: –Loss concealment –Jitter control Especially for telephony applications.