Digital Audio Compression

Name: Digital Audio Compression
Uploaded: 2017-07-10T17:36:26+00:00
Duration: PTM12S2
Channel: Zaire Gilyard
Description: Digital Audio Compression

Digital Audio Compression
CIS 465 Spring 2013

Speech Compression Compression of voice data
We have previously mentioned several methods that are used to compress voice data mu-law and A-law companding ADPCM and delta modulation These are examples of methods which work in the time domain (as opposed to the frequency domain) Often they are not even considered compression methods

Speech Compression Although the previous techniques are generally applied to speech data they are not designed specifically for such data Vocoders, instead, are Can’t be used with other analog signals Model speech so that the salient features can be captured in as few bits as possible Linear Predictive Coders model the speech waveform in time Also channel vocoders and formant vocoders In electronic music, vocoders allow a voice to modulate a musical source (via synthesizer, e.g.)

General Audio Compression
If we want to compress general audio (not just speech), different techniques are needed In particular, music compression is a more general form of audio compression We make use of psychoacoustical modeling Enable perceptual encoding based upon an analysis of the ear and brain perceive sound Perceptual encoding exploits audio elements that the human ear cannot hear well

Psychoacoustics If you have been listening to very loud music, you may have trouble afterwards hearing soft sounds (that normally you could hear) Temporal masking A loud sound at one frequency (a lead guitar) may drown out a sound at another frequency (the singer) Frequency masking

Equal-Loudness Relations
If we play two pure tones, sinusoidal sound waves, with the same amplitude but different frequencies One may sound louder than another The ear does not hear low or high frequencies as well as mid-range ones (speech) This can be shown with equal-loudness curves which plot perceived loudness on the axes of true loudness and frequency

Equal-Loudness Relations

Threshold of Hearing The following image is a plot of the threshold of human hearing for pure tones – at loudness below the curve, we don’t hear a tone

Threshold of Hearing A loud sound can mask other sounds at nearby frequencies as shown below

Frequency masking We can determine how a pure tone at a particular frequency affects our ability to hear tones at nearby frequencies Then, if a signal can be decomposed into frequencies, for those frequencies that are only partially masked, only the audible part will be used to set the quantization noise thresholds

Critical Bands Human hearing range divides into critical bands
Human auditory system cannot resolve sounds better than within about one critical band when other sounds are present Critical bandwidth represents the ear’s resolving power for simultaneous tones At lower frequencies the bands are narrower than at higher frequencies The band is the section of the inner ear which responds to a particular frequency

Critical Bands

Critical Bands Generally, the audio frequency range for hearing (20 Hz – 20 kHz) can be partitioned into about 24 critical bands (25 are typically used for coding applications The previous slide does not show several of the highest frequency critical bands The critical band at the highest audible frequency is over 4000 Hz wide The ear is not very discriminating within a critical band

Temporal Masking A loud tone causes the hearing receptors in the inner ear to become saturated, and they require time to recover This leads to the temporal masking effect After the loud tone we cannot immediately hear another tone – post-masking The length of the masking depends on the duration of the masking tone A masking tone can also block sounds played just before – pre-masking (shorter time)

Temporal Masking MPEG audio compression takes advantage of both temporal and frequency masking to transmit masked frequency components using fewer bits

MPEG Audio Compression
MPEG (Motion Picture Experts Group) is a family of standards for compression of both audio and video data MPEG-1 (1991) CD quality audio MPEG-2 (1994) Multi-channel surround sound MPEG-4 (1998) Also includes MIDI, speech, etc. MPEG-7 (2003) Not compression – searching MPEG-21 (2004) Not compression – digital rights management

MPEG-1 defined three downward compatible layers of audio compression Each layer offers more complexity in the psychoacoustic model used and hence better compression Increased complexity leads to increased delay Compatibility achieved by shared file header information Layer 1 – used for Digital Audio Tape Layer 2 – proposed for digital audio broadcasting Layer 3 – music (MPEG-1 layer 3 == mp3)

MPEG audio compression relies on quantization, masking, critical bands The encoder uses a bank of 32 filters to decompose the signal into sub-bands Uniform width – not exactly aligned to crit. bands Overlapping A Fourier transform is used for the psychoacoustical model Layer 3 adds a DCT to the sub-band filtering so that layers 1 and 2 work in the temporal domain and layer 3 in the frequency domain

PCM input filtered into 32 bands PCM FFT transformed for PA model Windows of samples (384, 576, 1152) coded at a time

Since the sub-bands overlap, aliasing may occur This is overcome by the use of a quadrature mirror filter bank Attenuation slopes of adjacent bands are mirror images

MPEG Audio Algorithm The PCM audio data is assembled into frames
Header – sync code of 12 1s SBS format – describe how many sub-band samples (SBS) are in the frame The SBS (384 in Layer 1, 1152 in Layers 2, 3) Ancillary data – e.g. multi-lingual data or surround-sound data

MPEG Audio Algorithm The sampling rate determines the frequency range
That range is divided up into 32 overlapping bands The frames are sent through a corresponding 32-filter filter bank If X is the number of samples per frame, each filter produces X/32 samples These are still samples in the temporal domain

MPEG Audio Algorithm The Fourier transform is performed on a window of samples surrounding the samples in the frame (either 1024 or 2*1024 samples) This feeds into the psychoacoustic model (along with the subband samples) Analyze tonal and nontonal elements in each band Determine spreading functions (how much each band affects another)

MPEG Audio Algorithm Find the masking threshold and signal-to- mask ratios for each band The scaling factor for each band is the maximum amplitude of the samples in that band The bit-allocation algorithm takes the SMRs and scaling factor and determines how many bits can be allocated (quantization granularity) for each band In MP3, the bits can be moved from band to band as needed to ensure a minimum amount of compression while achieving higher quality

MPEG Audio Algorithm Layer 1 has 12 samples encoded per band per frame
Layer 2 has 3 groups of 12 (36 samples) per frame Layer 3 has non-equal frequency bands Layer 3 also performs a Modified DCT on the filtered data, so we are in the frequency (not time) domain Layer 3 does non-uniform quantization followed by Huffman coding All of these modifications make for better (if more complex) performance for MP3

Stereo Encoding MPEG codes stereo data in several different ways
Joint stereo Intensity stereo Etc. We are not discussing these

MPEG File Format MPEG files do not have a header (so you can start playing/processing anywhere in the file) Consist of a sequence of frames Each frame has a header followed by audio data

MPEG File Format

MPEG File Format ID3 is a metadata container most often used in conjunction with the MP3 audio file format. Allows information such as the title, artist, album, track number, year, genre, and other information about the file to be stored in the file itself. Last 128 bytes of the file

Bit Rates Audio (or Video) compression schemes can be characterized as either constant bit rate (CBR) or variable bit rate (VBR) In general, higher compression can be achieved with VBR (at the cost of added complexity for code/decode) MPEG-1 Layers 1 and 2 are CBR only MP3 is either VBR or CBR Average Bit Rate (ABR) is a compromise

MPEG-2 AAC MPEG-2 (which is used for encoding DVDs) has an audio component as well MPEG-2 AAC (Advanced Audio Coding) standard was aimed at transparent sound reproduction for theatres 320 kbps for five channels (left, right, center, left- surround and right-surround) 5.1 channel systems include a low-frequency enhancement channel (“woofer”) AAC can also deliver high-quality stereo sound at bitrates less than 128 kbps

MPEG-2 AAC AAC is the default audio format for (e.g.): YouTube, iPod (iTunes), PS3, Nintendo Dsi, etc. Compared to MP3 More sampling frequencies More channels More efficient, simpler filterbank (pure MDCT) Arbitrary bit rates and variable frame lengths Etc. etc.

MPEG-4 Audio MPEG-4 audio integrates a number of audio components into one standard Speech compression Text-to-speech MIDI MPEG-4 AAC (similar to MPEG-2 AAC) Alternative coders (perceptual coders and structured coders)

Digital Audio Compression

Similar presentations

Presentation on theme: "Digital Audio Compression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Digital Audio Compression

Similar presentations

Presentation on theme: "Digital Audio Compression"— Presentation transcript:

Similar presentations

About project

Feedback