Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Audio Multimedia Systems (Module 1 Lesson 1)

Similar presentations


Presentation on theme: "Digital Audio Multimedia Systems (Module 1 Lesson 1)"— Presentation transcript:

1 Digital Audio Multimedia Systems (Module 1 Lesson 1)
Summary: Basic concepts underlying sound Facts about human perception of sound Computer representation of sound (Audio) A brief introduction to MIDI Sources: My research notes Dr. Ze-Nian Li’s course material at: In this lesson we will study one of the media types that is the core of most Multimedia systems, audio. We will first understand the basic concepts underlying sound, We will also look at some facts about human perception of sound As most natural phenomena, sound occurs in an analog form which has to be digitized in order to represent it and store it on a computer. We will look at this process of conversion with the help of a fundamental theorem in digital signal processing called the “Nyquist Sampling Theorem”. Lastly, we will look at an alternative mechanism for representation of sound that has both good compression properties and provides ease of non-linear editing. The crux of the material is derived again from Dr. Ze-Nian Li’s course material at Simon Fraser University.

2 Sound Facts Sound is a continuous wave that travels through the air
The wave is made up of pressure differences. Sound is detected by measuring the pressure level at a location Sound waves have normal wave properties (reflection, refraction, diffraction etc.) The human Ear detecting Sound Sound is a continuous wave that travels through air. The wave itself is comprised of pressure difference. Detection of sound is accomplished by measuring these pressure levels and their succession in time. The human ear does this detection naturally when the wave with its pressure differences impinges on the

3 Sound Facts Wave Characteristics
Frequency: Represents the number of periods in a second and is measured in hertz (Hz) or cycles per second. Human hearing frequency range: 20Hz to 20kHz (audio) Amplitude: The measure of displacement of the air pressure wave from its mean. Related to but not the same as loudness Amplitude Air Pressure Time One Period One particular frequency component Like all waves sound can be characterized by the properties, frequency and amplitude. In order to understand these characteristics lets take a look at one frequency component of sound. It’s a constant frequency wave. Most sounds can be expressed mathematically as a combination of such waves of differing frequencies and amplitude. As is done using a Fourier series. The frequency refers to the rate at which the wave repeats. It is expressed as cycles per second or by the units hertz. The human hear is capable of perceiving wave frequencies in the range 20Hz and 20KHz, which is audio in nature. The amplitude is a measure of the displacement of the wave from the mean. For human perception this is related but not the same as loudness.

4 Principles of Digitization
Why Digitize? Microphones, video cameras produce analog signals (continuous-valued voltages) To store audio or video data into a computer, we must digitize it by converting it into a stream of numbers. Sound as analog signal As I noted before sound is a continuous wave and as all natural phenomena is analog in nature. Microphones detect this analog input, which is nothing but a continuous sequence of voltages. However, in order to store this input in a computer one has to convert it to a digital form, that is into 0s and 1s. Further a continuous wave has infinite resolution which cannot be represented in a computer.

5 Principles of Digitization
Sampling: Divide the horizontal axis (time) into discrete pieces Quantization: Divide the vertical axis (signal strength - voltage) into pieces. For example, 8-bit quantization divides the vertical axis into 256 levels. 16 bit gives you levels. Lower the quantization, lower the quality of the sound Linear vs. Non-Linear quantization: If the scale used for the vertical axis is linear we say its linear quantization; If its logarithmic then we call it non-linear (-law or A-law in Europe). The non-linear scale is used because small amplitude signals are more likely to occur than large amplitude signals, and they are less likely to mask any noise. Digitization is achieved by recording or sampling the continuous sound wave at discrete points. The more frequently one samples the closer one gets to capturing the continuity of the wave. Therefore, sampling is the process of dividing the horizontal (time-axis) into discrete points. The other aspect of digitization is the measurement of the voltages at these discrete sampling points. As it turns out these values may be of arbitrary precision, that is we could have values containing small fractions or decimal numbers that take more bits to represent. To cope with this arbitrary precision we use quantization which divides the vertical axis (signal strength or voltage) into discrete points. For example, 8-bit quantization divides the axis into 256 discrete voltage levels. While we are on the topic of quantization, it is important to note that not all quantization is uniform or linear. That is the vertical axis need not always be a linear scale, a non-linear logarithmic scale is used in mu-law encoding.

6 Sampling and Quantization
3-bit quantization Sampling rate: Number of samples per second (measured in Hz) E.g., CD standard audio uses a sampling rate of 44,100 Hz (44100 samples per second) 3-bit quantization gives 8 possible sample values E.g., CD standard audio uses 16-bit quantization giving values. Why Quantize? To Digitize! Here is a demonstration of the concepts of sampling and quantizaion. Again we consider a sinusoidal sound wave and sample it at discrete points. We notice that the resulting sampled values could be of arbitrary precision. Therefore we use a 3-bit quantization to discretize these values to finite values. One important observation to make is the process of quantization gives us discrete finite precision values which are needed for representation on a computer. Another important question to ask ourselves is: How often should we sample the signal so as achieve a faithful digital representation of the analog signal. This question will be answered by the Nyquist Sampling theorem which we will look at momentarily.

7 Nyquist Theorem Consider a sine wave Sampling once a cycle Appears as a constant signal Once again lets consider a sinusoidal sound wave. If we were to sample it once every cycle. It would appear as a constant signal. On the other hand lets say we sample it twice every three cycles or 1.5 times a cycle then it would appear low frequency sine wave with some extrapolation. As a matter of fact it may appear as a sawtooth wave without extrapolation. Nyquist sampling theorem suggests that in order to get lossless digitization that is a faithful digital representation of the analog signal one has to sample at twice the rate of the maximum frequency component. Thus in this example one would have to sample the wave two times each cycle. Sampling 1.5 times each cycle Appears as a low frequency sine signal For Lossless digitization, the sampling rate should be at least twice the maximum frequency responses

8 Application of Nyquist Theorem
Nyquist theorem is used to calculate the optimum sampling rate in order to obtain good audio quality. The CD standard sampling rate of Hz means that the waveform is sampled times per sec. Digitally sampled audio has a bandwidth of (20 Hz - 20 KHz). By sampling at twice the maximum frequency (40 KHz) we could have achieved good audio quality. CD audio slightly exceeds this, resulting in an ability to represent a bandwidth of around Hz. Lets look at some practical applications of the theorem. Once again, Nyquist theorem is used to calculate the optimum sampling rate in order to obtain good sound quality. For CD audio the sampling rate is 44,100 Hz or 44,100 samples per second.

9 Quantization (Quality ->SNR)
In any analog system, some of the voltage is what you want to measure (signal), and some of it is random fluctuations (noise). SNR: Signal to Noise ratio captures the quality of a signal (dB) Signal to Quantization Noise Ratio (SQNR) The quantization error (or quantization noise) is the difference between the actual value of the analog signal at the sampling time and the nearest quantization interval value. The largest (worst) quantization error is half of the interval? Lets now take a look at the quality of a digital signal. This is expressed by the signal-to-noise ratio and is given by the formula… As a demonstration of this concept, lets determine the quality of a signal which has noise introduced due to quantization. Recall that quantization results in approximating the voltage values to the nearest discrete point on the y axis. This process introduces error or noise. The quantization noise is the difference between the actual value of the analog signal at a sampling point and the nearest quantization value. Obviously in linear quantization, the worst case quantization error is half the size of the interval. V2signal V2noise Vsignal Vnoise SNR = 10 log = 20 log

10 SQNR Calculation (WC) If we use N bits per sample, the range of the digital signal is: -2N-1 to  2N-1 The worst-case signal to quantization noise ratio is given by: Vsignal Vquant - noise 2N-1 1/2 SQNR = 20 log = 20 log = N x 20 log 2 = 6.02N (dB) Lets calculate the worst case signal-to-quantization ratio. Assume we are using N bits per sample. Therefore the range of quantization values is from –2 to the N-1 to 2 to the N-1. The SQNR is given by 20 log the ratio of the maximum signal value by the maximum qunatization noise. That is 2 to the N-1 divided by half of 2 to the –1. Which gives us log 2 to the N. This is approximately 6.02N. We note that by adding another bit we can add approximately 6 dB to the worst-case signal quality. So 16 bits enable a maximum SQNR of 96 dB. Each bit adds about 6 dB of resolution, so 16 bits enable a maximum SQNR = 96 dB.

11 Miscellaneous Audio Facts
Typical Audio Formats Popular audio file formats include .au (Unix), .aiff (MAC, SGI), .wav (PC, DEC) A simple and widely used audio compression method is Adaptive Delta Pulse Code Modulation (ADPCM). Based on past samples, it predicts the next sample and encodes the difference between the actual value and the predicted value. Before we wrap up the discussion of digital audio it is interesting to note a few miscellaneous facts about digital audio. First we find that the most popular audio file formats are .au in UNIX, .aiff on the Mac and .wav on the PC. Note that these are uncompressed formats. Formats like mp3 and real-audio’s ram involve compression which will be covered later. While we are talking about compression. I would like to draw your attention to a simple but popular method for compression called Adaptive Delta PCM or ADPCM. In t his method, the samples are predicted based on the behavior of past samples and the difference between the predicted value and the actual observed value is coded. To understand the source of the compression note that the difference (if the prediction is sound) is a smaller number that the actual value and encoding it will take fewer bits.

12 Audio Quality vs. Data Rate
Sample Rate (kHz) Bits per Sample Mono/ Stereo Data Rate (kBytes/sec) (uncompressed) Frequency Band Telephone 8 Mono Hz AM Radio 11.025 11.0  KHz FM Radio 22.050 16 88.2 CD 44.1 176.4 Hz DAT 48 192.0 CD coding of Audio when done without compression takes 176 kilobytes of space for 1 second of recorded audio.

13 Synthesizer/Keyboard MIDI Interface/Sound Card
Musical Instrument Digital Interface a protocol that enables computer, synthesizers, keyboards, and other musical devices to communicate with each other. Setup: MIDI OUT of synthesizer is connected to MIDI IN of sequencer. MIDI OUT of sequencer is connected to MIDI IN of synthesizer and "through" to each of the additional sound modules. Working: During recording, the keyboard-equipped synthesizer is used to send MIDI message to the sequencer, which records them. During play back, messages are sent out from the sequencer to the sound modules and the synthesizer which will play back the music. THRU IN OUT IN OUT Synthesizer/Keyboard MIDI Interface/Sound Card (Sequencer) IN THRU MIDI Module A IN THRU MIDI Module B Etc. Typical Sequencer setup

14 MIDI: Data Format Information traveling through the hardware is encoded in MIDI data format. The encoding includes note information like beginning of note, frequency and sound volume; upto 128 notes The MIDI data format is digital The data are grouped into MIDI messages Each MIDI message communicates one musical event between machines. An event might be pressing keys, moving slider controls, setting switches and adjusting foot pedals. 10 mins of music encoded in MIDI data format is about 200 Kbytes of data. (compare against CD-audio!)


Download ppt "Digital Audio Multimedia Systems (Module 1 Lesson 1)"

Similar presentations


Ads by Google