Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound.

Slides:



Advertisements
Similar presentations
Multimedia: Digitised Sound Data Section 3. Sound in Multimedia Types: Voice Overs Special Effects Musical Backdrops Sound can make multimedia presentations.
Advertisements

Digital Audio Teppo Räisänen LIIKE/OAMK. General Information Auditive information is transmitted by vibrations of air molecules The speed of sound waves.
CNIT 132 – Week 9 Multimedia. Working with Multimedia Bandwidth is a measure of the amount of data that can be sent through a communication pipeline each.
Sound in multimedia How many of you like the use of audio in The Universal Machine? What about The Universal Computer? Why or why not? Does your preference.
Sound can make multimedia presentations dynamic and interesting.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
School of Informatics CG087 Time-based Multimedia Assets Compression & StreamingDr Paul Vickers1 Compression & Streaming Serving, shrinking, and otherwise.
1. Digitization of Sound What is Sound? Sound is a wave phenomenon like light, but is macroscopic and involves molecules of air being compressed and expanded.
4.2 Multimedia Elements Audio 1. Learning Outcomes: At the end of the lesson, students should be: a) describe the purpose of using audio in multimedia.
Analogue to Digital Conversion (PCM and DM)
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
Chapter 5-Sound.
I Power Higher Computing Multimedia technology Audio.
5/4/20151 Lesson 5 Sound. 5/4/20152 Overview Introduction to sound. Multimedia system sound. Digital audio. MIDI audio. Audio file formats.
Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann.
4-Integrating Peripherals in Embedded Systems (cont.)
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
From the air to the iPod. Minute disturbances in the air, caused by a vibrating object Air molecules bunch together, then spread out Changes in density.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Digital Audio.
Multimedia communications EG-371Dr Matt Roach Multimedia Communications EG 371 and EG 348 Dr Matthew Roach Lecture 2 Digital.
Carnegie Mellon. Carnegie Mellon Sound Carnegie Mellon Sound Sampling Basics Common Sampling Rates 8KHz (Phone) or kHz (Phone, NeXT) kHz.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Sound and Speech Recognition. © Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 2 Carnegie Mellon What is Sound ? Acoustics is the study.
1 Storing Digital Audio. 2 Storage  There are many different types of storage medium and encoding methods for the storage of digital audio  CD  DVD.
© Copyright 2002 Michael G. Christel and Alexander G. Hauptmann 1 Carnegie Mellon.
Audio Fundamentals September 24, 1998 Lawrence A. Rowe University of California, Berkeley URL: L.A.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Analogue and Digital Signals SL – Option C.1. Signals When talking about electronics we will talk about ‘signals’ –This is simply the transfer of information.
Digital audio. In digital audio, the purpose of binary numbers is to express the values of samples that represent analog sound. (contrasted to MIDI binary.
Introduction to Interactive Media 10: Audio in Interactive Digital Media.
Lecture # 22 Audition, Audacity & Sound Editing Sound Representation.
By Frankie, K. F. Yip Chapter 6 Speech. By Frankie, K. F. YipLecture 6 - Sound2 Sound Waves.
COMP Representing Sound in a ComputerSound Course book - pages
1 Week 10: Audio Recording. 2 Overview  What is sound?  What does analogue mean?  Analogue-to-Digital conversion  Key terms in digital audio  Compression.
1 4-Integrating Peripherals in Embedded Systems (cont.)
Media Representations - Audio
Sound and audio. Table of Content 1.Introduction 2.Properties of sound 3.Characteristics of digital sound 4.Calculate audio data size 5.Benefits of using.
Signal Digitization Analog vs Digital Signals An Analog Signal A Digital Signal What type of signal do we encounter in nature?
COSC 1P02 Introduction to Computer Science 4.1 Cosc 1P02 Week 4 Lecture slides “Programs are meant to be read by humans and only incidentally for computers.
Overview of Multimedia A multimedia presentation might contain: –Text –Animation –Digital Sound Effects –Voices –Video Clips –Photographic Stills –Music.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 9 This presentation © 2004, MacAvon Media Productions Sound.
Sound element Week - 11.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Georgia Institute of Technology Introduction to Processing Digital Sounds part 1 Barb Ericson Georgia Institute of Technology Sept 2005.
1 Introduction to Information Technology LECTURE 6 AUDIO AS INFORMATION IT 101 – Section 3 Spring, 2005.
Digital Recording. Digital recording is different from analog in that it doesn’t operate in a continuous way; it breaks a continuously varying waveform.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
© 2011 The McGraw-Hill Companies, Inc. All rights reserved Chapter 4: Sound.
Encoding and Simple Manipulation
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Intro-Sound-part1 Introduction to Processing Digital Sounds part 1 Barb Ericson Georgia Institute of Technology Oct 2009.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Multimedia Sound. What is Sound? Sound, sound wave, acoustics Sound is a continuous wave that travels through a medium Sound wave: energy causes disturbance.
Session 18 The physics of sound and the manipulation of digital sounds.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
© 2011 The McGraw-Hill Companies, Inc. All rights reserved Chapter 4: Sound.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Multimedia: Digitised Sound Data
Sound Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman
Assist. Lecturer Safeen H. Rasool Collage of SCIENCE IT Dept.
Recap In previous lessons we have looked at how numbers can be stored as binary. We have also seen how images are stored as binary. This lesson we are.
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann What is Sound ?  Acoustics is the study of sound.  Physical - sound as a disturbance in the air  Psychophysical - sound as perceived by the ear  Sound as stimulus (physical event) & sound as a sensation.  Pressures changes (in band from 20 Hz to 20 kHz) Physical terms  Amplitude  Frequency  Spectrum

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound Waves  In a free field, an ideal source of acoustical energy sends out sound of uniform intensity in all directions. => Sound is propagating as a spherical wave.  Intensity of sound is inversely proportional to the square of the distance (Inverse distance law).  6 dB decrease of sound pressure level per doubling the distance.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound Waves

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann What is Sound

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann How we hear –Ear connected to the brain  left brain: speech  right brain: music  Ear's sensitivty to frequency is logarithmic  Varying frequency response  Dynamic range is about 120 dB (at 3-4 kHz)  Frequency discrimination 2 Hz (at 1 kHz)  Intensity change of 1 dB can be detected.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Digitizing Sound

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Digitally Sampling

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Undersampling

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Clipping

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Quantization

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Digital Sampling Sampling is dictated by the Nyquist sampling theorem which states how quickly samples must be taken to ensure an accurate representation of the analog signal.Sampling is dictated by the Nyquist sampling theorem which states how quickly samples must be taken to ensure an accurate representation of the analog signal. The Nyquist sampling theorem states that the sampling frequency must be greater than the highest frequency in the original analog signal.The Nyquist sampling theorem states that the sampling frequency must be greater than the highest frequency in the original analog signal.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound Sampling Basics Common Sampling Rates Common Sampling Rates 8KHz (Phone) or kHz (Phone, NeXT) 8KHz (Phone) or kHz (Phone, NeXT) kHz (1/4 CD std) kHz (1/4 CD std) 16kHz (G.722 std) 16kHz (G.722 std) 22.05kHz (1/2 CD std) 22.05kHz (1/2 CD std) 44.1kHz (CD, DAT) 44.1kHz (CD, DAT) 48kHz (DAT) 48kHz (DAT) Bits per Sample Bits per Sample 8 or 16 8 or 16 Number of Channels Number of Channels mono/stereo/quad/ etc. mono/stereo/quad/ etc.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Space Requirements Storage Requirements for One Minute of Sound

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Common Sound File Formats Mulaw (Sun, NeXT).au Mulaw (Sun, NeXT).au RIFF Wave (MS WAV).wav RIFF Wave (MS WAV).wav MPEG Audio Layer (MPEG).mpa.mp3 MPEG Audio Layer (MPEG).mpa.mp3 AIFC (Apple, SGI).aiff.aif AIFC (Apple, SGI).aiff.aif HCOM (Mac).hcom HCOM (Mac).hcom SND (Sun, NeXT).snd SND (Sun, NeXT).snd VOC (Soundblaster card proprietary standard).voc VOC (Soundblaster card proprietary standard).voc AND MANY OTHERS! AND MANY OTHERS!

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Mu-Law u-LAW (or mu-LAW) is sgn(x) y= ln( 1+ u |x|) ln(1+u) u=100 or 255

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann What’s in a Sound File Format Header Information Header Information Magic Cookie Magic Cookie Sampling Rate Sampling Rate Bits/Sample Bits/Sample Channels Channels Byte Order Byte Order Endian Endian Compression type Compression type Data Data

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Example File Format (NIST SPHERE) NIST_1A 1024 sample_rate -i channel_count -i 1 sample_n_bytes -i 2 sample_byte_format -s2 10 sample_sig_bits -i 16 sample_count -i sample_coding -s3 pcm sample_checksum -i end_head

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann WAVe file format (Microsoft) RIFF A collection of data chunks. Each chunk has a 32-bit Id followed by a 32-bit chunk length followed by the chunk data. 0x00 chunk id 'RIFF' 0x00 chunk id 'RIFF' 0x04 chunk size (32-bits) 0x04 chunk size (32-bits) 0x08 wave chunk id 'WAVE' 0x08 wave chunk id 'WAVE' 0x0C format chunk id 'fmt ' 0x0C format chunk id 'fmt ' 0x10 format chunk size (32-bits) 0x10 format chunk size (32-bits) 0x14 format tag (currently pcm) 0x14 format tag (currently pcm) 0x16 number of channels 1=mono, 2=stereo 0x16 number of channels 1=mono, 2=stereo 0x18 sample rate in hz 0x18 sample rate in hz 0x1C average bytes per second 0x1C average bytes per second 0x20 number of bytes per sample 0x20 number of bytes per sample 1 = 8-bit mono 1 = 8-bit mono 2 = 8-bit stereo or 2 = 8-bit stereo or 16-bit mono 16-bit mono 4 = 16-bit stereo 4 = 16-bit stereo 0x22 number of bits in a sample 0x22 number of bits in a sample 0x24 data chunk id 'data' 0x24 data chunk id 'data' 0x28 length of data chunk (32-bits) 0x28 length of data chunk (32-bits) 0x2C Sample data 0x2C Sample data

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Digital Audio Today  Analog elements in the audio chain are replaced with digital elements.  16-bit wordlength, 32/44.1/48 kHz sampling rates.  Mostly linear signal processing.  Wide range of digital formats and storage media.  Rapid development of 1-bit conversation technology => better SNR, phase and linearity.  Rapid increase of signal processing power => possibility to implement new, complex features.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann CD vs LP  Information is stored digitally.  The length of its data pits represents a series of 1s and 0s.  Both audio channels are stored along the same pit track.  Data is read using laser beam.  Information density about 100 times greater than in LP.  CD player can correct disc errors.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Benefits of CD  Robust  No degradation from repeated playings because data is read by the laser beam.  Error correction  Transport`s performance does not affect the quality of audio reproduction.  Digital circuitry more immune to aging and temperature problems  Data conversion is independent of variations in disc rotational speed, hence wow and flutter are neglible.  SNR over 90 dB.  Subcode for display, control and user information

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Digital CD Format SamplingSampling  44.1 kHz => 10 % margin with respect to the Nyquist frequency (audible frequencies below 20 kHz)  16-bit linear => theoretical SNR about 98 dB (for sinusoidal signal with maximum allowed amplitude)  audio bit rate 1.41 Mbit/s (44.1 kHz * 16 bits * 2 channels)  Cross Interleaved Reed-Solomon Code (CIRC) for error correction SpecificationsSpecifications  Playing time max min  Disc diameter 120 mm  Disc thickness 1.2 mm  One sided medium, rotates clockwise  Signal is recorded from inside to outside  Pit is about 0.5 µm wide  Pit edge is 1 and all other areas whether inside or outside a pit, are 0s

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Compression of Sound

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Motivation for Sound Compression  need to minimize transmission costs or provide cost efficient storage  demand to transmit over channels of limited capacity such as mobile radio channels  need to share capacity for different services (voice, audio, data, graphics, images) in integrated service network

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Compression u-LAW sihttp://shuttle.nasa.gov/askmcc/answers/lence detection ADPCM (adaptive, delta PCM, 24/32/40 kbps) LPC-10E (Linear Predictive Coding 2.4kb/s) CELP 4.8Kb/s - builds on LPC GSM (European Cell Phones, RPE-LPC) 1650 bytes/sec (at 8000 samples/sec) RealAudio (builds on CELP, GSM, proprietary) MPEG Audio Layers (builds on ADPCM) Layer-2: From 32 kbps to 384 kbps - target bit rate of 128 kbps Layer-3: From 32 kbps to 320 kbps - target bit rate of 64 kbps Complex compression, using perceptual models

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Compression u-LAW sihttp://shuttle.nasa.gov/askmcc/answers/lence detection ADPCM (adaptive, delta PCM, 24/32/40 kbps) LPC-10E (Linear Predictive Coding 2.4kb/s) CELP 4.8Kb/s - builds on LPC GSM (European Cell Phones, RPE-LPC) 1650 bytes/sec (at 8000 samples/sec) RealAudio (builds on CELP, GSM, proprietary) MPEG Audio Layers (builds on ADPCM) Layer-2: From 32 kbps to 384 kbps - target bit rate of 128 kbps Layer-3: From 32 kbps to 320 kbps - target bit rate of 64 kbps Complex compression, using perceptual models

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound Editing GoldWave ( ( requires a sound card. requires a sound card. digital audio sound player, recorder and editor digital audio sound player, recorder and editor can load, play and edit many different file formats can load, play and edit many different file formats.wav,.au,.voc,.snd.wav,.au,.voc,.snd displays separate graphics for the left and right channels displays separate graphics for the left and right channels very easy to use very easy to use good sound quality good sound quality Others: WHAM, Cool Edit, SOX, WINPLANY, Digital Audio Playback Facility, MOD4Win, etc. Others: WHAM, Cool Edit, SOX, WINPLANY, Digital Audio Playback Facility, MOD4Win, etc.

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Compression Approaches Delta codingDelta coding Encode differences only Encode differences only Predictive codingPredictive coding Predict the next sample Predict the next sample Linear Predictive Coding (LPC) - mostly for speechLinear Predictive Coding (LPC) - mostly for speech Describe fundamental frequencies + ‘error’ Describe fundamental frequencies + ‘error’ CELP, cell-phone standards CELP, cell-phone standards Variable Rate EncodingVariable Rate Encoding Don’t encode silences Don’t encode silences Subband codingSubband coding Split into frequency bands each encoded separately + efficiently Split into frequency bands each encoded separately + efficiently Psycho-acoustical codingPsycho-acoustical coding drop bits where you can’t hear it drop bits where you can’t hear it

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Tips for Audio on the Web There is no generic audio standard on the Web Listening to 16-bit sounds on an 8-bit system results in strange effects Users will be annoyed if they spend a lot of time downloading a sound and they can’t play it Distribute only 8-bit sounds on your Web page Or, provide different sound files in both 8- and 16-bits Record in the highest sampling rate and size you can, and then process down to 8-bit Keep file size small downsampling to 8-bit use a lower sampling rate use mono sounds Describe what format those sounds are in WAVE, AIFF, or other format Providing the file size in the description is a politeness to help estimate download times If you need high sound quality and have large audio files: Use a smaller sound clip in m-law format as a preview or for those who can’t to listen to the higher-quality sample. Check out

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Speech Recognition in Brief

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Acoustic Modeling Describes the sounds that make up speech Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Speech Recognition Speech Recognition Knowledge Sources

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Speech Recognition O is an acoustical ‘observation’ W is a ‘word’ we are trying to recognize Maximize w = argmax (P(w) | O) P(w|O) is unknown so by Bayes’ rule: P(O|w) P(w) P(w|O) = p(O) p(O)

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Hidden Markov Model

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Searching the Speech Signal Trellis

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Language Models

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Lexicon - links words to phones in acoustic model Aaron EH R AX N Aaron(2) AE R AX N abandon AX B AE N D AX N abandoned AX B AE N D AX N DD abandoning AX B AE N D AX N IX NG abandonment AX B AE N D AX N M AX N TD abated AX B EY DX IX DD abatement AX B EY TD M AX N TD abbey AE B IY Abbott AE B AX TD Abboud AA B UW DD abby AE B IY abducted AE BD D AH KD T IX DD Abdul AE BD D UW L

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Continual Progress in Speech Recognition Increasingly Difficult Tasks, Steadily Declining Error Rates CONVERSATIONAL SPEECH Non-English English BROADCAST NEWS 20,000 Word Varied microphones Standard microphone Noisy environment Unlimited Vocabulary 5000 word All results are Speaker -Independent READ SPEECH 1000 Word vocabulary Word Error Rate (%) NSA/Wayne/Doddington

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann References (conversion tool) (conversion tool) Sub-Band Coding: Sub-Band Coding: Sub-Band Coding Sub-Band Coding Speech Recognition Speech Recognition

Carnegie Mellon © Copyright 2000 Michael G. Christel and Alexander G. Hauptmann Sound That’s all for today