Psycho-acoustics and MP3 audio encoding

Slides:



Advertisements
Similar presentations
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Advertisements

SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Dale & Lewis Chapter 3 Data Representation Analog and digital information The real world is continuous and finite, data on computers are finite  need.
Digital Audio Compression
Frequency selectivity of the auditory system. Frequency selectivity Important for aspects of auditory perception such as, pitch, loudness, timbre, melody,
Periodicity and Pitch Importance of fine structure representation in hearing.
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Loudness Physics of Music PHY103 experiments:
Pitch Perception.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Loudness Physics of Music PHY103 experiments: mix at different volumes
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Sound source segregation (determination)
Chapter 6: The Human Ear and Voice
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
EE Audio Signals and Systems Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Beats and Tuning Pitch recognition Physics of Music PHY103.
Psycho- acoustics and MP3 audio encoding Physics of Music PHY103.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
CMPT 365 Multimedia Systems
Media Representations - Audio
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Digital Media Dr. Jim Rowan ITEC 2110 Audio. What is audio?
The Ear As a Frequency Analyzer Reinier Plomp, 1976.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Fletcher’s band-widening experiment (1940)
SOUND PRESSURE, POWER AND LOUDNESS
Fletcher’s band-widening experiment (1940) Present a pure tone in the presence of a broadband noise. Present a pure tone in the presence of a broadband.
MP3 and MP4 Audio By: Krunal Tailor
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Fletcher’s band-widening experiment (1940)
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Multimedia Systems and Applications
Loudness Physics of Music PHY103 experiments:
Psychoacoustics: Sound Perception
Digital Media Lecture 12: Additional Audio Georgia Gwinnett College
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Digital Media Dr. Jim Rowan ITEC 2110 Audio.
Presentation transcript:

Psycho-acoustics and MP3 audio encoding Physics of Music PHY103 Would be good to add examples of sizes of different files pitch examples are repetitive

MP3 MPEG is moving pictures experts group. set up by ISO (international standards organization) every few years issues a standard MPEG1 (1992), MPEG2(1994).. MP3 stands for MPEG audio layer III Longer history – age of photo-video compression – in part started with audio compression experiments in the late ’80s

Auditory Coding Time frequency decomposition – divide the signal into pieces, obtain the spectrum of each piece Use psycho-acoustic masking model to determine what information to keep Store the information in the most compact way possible – minimize the bitrate and maximize the audible auditory content System of synchronization

Encoding and Decoding Encoding: Auditory signal (from a recording) is coded into an mp3 file containing carefully stored spectral information Decoding: mp3 file is turned back into an auditory file that can be output to your speakers Streaming: This can be done in real time even if you don’t have the entire file

Lossy vs Lossless compression Compression: Store in a very compact format, more compact than the original audio file Lossless compression means no information is removed MP3 is a lossy type of compression. Information is lost during compression. Only inaudible information should be removed. Topic of current research on whether expert listeners can hear differences and how much is enough ... MP3 achieves a 10:1 compression ratio! This enables bit-streaming, makes storing audio very compact

Adding noise Rather than removing information MP3 adds noise. This is done by describing the signal with degraded digital precision. If you fail to digitize something sufficiently accurately, this is equivalent to adding noise The added noise should be inaudible it is below the mask threshold

Easy chops: Don’t bother storing information outside the range of hearing (outside <40Hz, >15kHz) Stereo info not stored for low frequencies

Bad ways to compress an audio file Reduce the total number of bits per sample (e.g. 32 bit to 16 or 16 to 8 bit)  this gives you a factor of 2 in compression. However you get a noisier signal Reduce the sampling rate (44kHz to 22kHz or 22kHz to 10kHz). Total loss of all high frequency information. Again only a gain of a factor of 2 in size. Equivalent to a high pass filter. A factor 10:1 in compression cannot be achieved using linear compression schemes

Masking If a dominant tone is present then noise can be added at frequencies next to it and this noise will not be heard. Less precision is required to store nearby frequencies. 13dB critical band

Definition of masking The process by which the threshold of audibility for one sound is raised by the presence of another (masking) sound The amount by which the threshold is raised by the masker (in dB).

Critical Bands by Masking ASAdemo2 Tone in presence of broad band noise The critical band width at 2000Hz is about 280Hz so you can hear more steps when the noise bandwidth is reduced below this width.

critical band critical band A sine (signal) in the presence of noise that has a band width (in frequency) centered around the signal. critical band Past a particular frequency width the masking doesn’t increase. The wider the noise bandwidth the more the signal (sine wave) is masked. critical band

Critical Bands by Loudness comparison A noise band of 1000Hz center frequency. The total power is kept constant but the width of the band increased. When the band is wider than the critical band the noise sounds louder. ASA demo 3

Critical band width as a function of frequency Size of critical band is typically one tenth of the frequency

Critical band concept Only a narrow band of frequencies surrounding the tone – those within the critical band contribute to masking of the tone When the noise just masks the tone, the power of the tone divided by the power of the noise inside the band is a constant.

The nature of the auditory filter The auditory filter is not necessarily square – actually it is more like a triangle shape Critical band width is sometimes referred to as ERB (equivalent rectangular bandwidth) Shape difficult to measure in psychoacoustic experiments because of side band listening affects some innovative experiments (notched filtered noise + signal) designed to measure the actual shape of the filter).

Physiological reasons for the masking Basal membrane? The critical bandwidths at different frequencies correspond to fixed distances along the basal membrane. However the masking could be a result of feedback in the neuron firing instead. Negative reinforcement or suppression of signals. Or swamping of signals.

Temporal effects - non-simultaneous masking The peak ratio of the masker is important -- that means its variations in volume as a function of time compared to its rms value. Short loud peaks don’t necessarily contribute to the masking as much as a continuous noise. Both forward and backward masking - masking can occur if a loud masker is played just after the signal! Masking decays to 0 after 100-200ms

Physiological explanations for temporal masking Basal membrane is ringing preventing detection in that region for a particular time Neurons take a while to recover - neural fatigue

Comodulation masking release A masked signal if comodulated with frequencies outside the critical band can be detected below the masking threshold In the same way that the overtones/spectrum is used to identify a sound. Sounds outside the critical band, since they are modulated the same as the signal, are used to pull it out (detect it) from more than one critical band region.

Perception of loudness Just noticeable difference JND in Sound Intensity A useful general reference is that the just noticeable difference in sound intensity for the human ear is about 1 decibel. JND = 1 decibel In fact, the use of the factor of 10 in the definition of the decibel is to create a unit which is about the least detectable change in sound intensity.

JND as a function of loudness There are some variations. The JND is about 1 dB for soft sounds around 30-40 dB at low and midrange freqencies. It may drop to 1/3 to 1/2 a decibel for loud sounds. Caution must be used in applying the "one decibel" criterion. It presumes that you are increasing the same sound by one decibel.

Loudness and the Critical Band When two sounds of equal loudness when sounded separately are close together in pitch, their combined loudness when sounded together will be only slightly louder than one of them alone. They may be said to be in the same critical band where they are competing for the same nerve endings on the basilar membrane of the inner ear. According to the place theory of pitch perception, sounds of a given frequency will excite the nerve cells of the organ of Corti only at a specific place. The available receptors show saturation effects which lead to the general rule of thumb for loudness by limiting the increase in neural response.

Outside the critical band If the two sounds are widely separated in pitch, the perceived loudness of the combined tones will be considerably greater because they do not overlap on the basilar membrane and compete for the same hair cells.

Pitch information area for complex tones

Pitch depends on partial pitches Butler 3.5b second of each pair has partials 10% sharp. Perceived pitch change depends on frequency

MP3 schematic Input: 16 bit at 44kHz sampling is 768kbit/s

Filter bank: band pass filter into 32 sub-bands each centered at a different frequency MDCT: Modified Discrete Cosine Transform– each sub-band is divided into time windows. Windows overlap to get rid of a problem called aliasing (high frequencies are confused with low ones). Overlap needed for MDCT

13 dB miracle if the signal is 13 dB louder than then noise then the noise can’t be heard (within a band). Each sub-band is quantized differently depending upon the masking threshold estimated in that band

FFT is used to compute the masking thresh-holds

Pushing MP3 to its limits -uncompressed -over compressed mp3 Above compressing to 60kbps Using home.c4.scale.AIFF show mp3 options DEMO with Adobe to experiment

Limits of MP3 Above ~80kbps (kilo bits per second) and 22kHz sampling I find I get reasonable sound. Compressing beyond this can do pretty weird things – I found that noise sounded weird and lack of high frequencies led to lost brilliance in timbre - also attacks suffered pitch and timbre changes