## Presentation on theme: "Tamara Berg Advanced Multimedia"— Presentation transcript:

Sound Analysis Tamara Berg Advanced Multimedia

Feb 28 Matlab sound lab Bring headphones!!!

Reminder HW1 due Wed, Feb 27 11:59pm Questions?? Li & Drew

Summary Sound in a computer is represented by a vector of values – discrete samples of the continuous wave. [x1 x2 x3 x4 x5 x6….] Here you have two things to decide: What two things do you have to decide to go from an analog sound wave to a digital representation? Li & Drew

Sound Wave The amplitude of a sound wave changes over time. Amplitude corresponds to pressure increasing or decreasing over time. Li & Drew

Summary Sound in a computer is represented by a vector of values – discrete samples of the continuous wave. [x1 x2 x3 x4 x5 x6….] Here you have two things to decide: How often to sample the signal (sampling rate). How to quantize your continuous sound signal into a set of discrete values. And what does uniform vs non-uniform quantization mean? At the quiet end we can hear smaller changes in the signal than at the very loud end of things. So we want to quantize the quiet parts with more bits than the loud parts. This is a non-uniform quantization. This is called a Perceptual coder - Allocates more bits to intervals for which a small change in the stimulus produces a large change in response. Li & Drew

Audio Filtering • Prior to sampling and AD conversion, the audio signal is also usually filtered to remove unwanted frequencies. The frequencies kept depend on the application: (a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens out lower and higher frequencies. (b) An audio music signal will typically contain from about 20Hz up to 20kHz. (c) At the DA converter end, high frequencies may reappear in the output — because of sampling and then quantization. (d) So at the decoder side, a lowpass filter is used after the DA circuit. Next class we’ll see how to create filters in matlab and apply them to audio signals! When we go from an analog sound signal to a digital one the audio signal is usually filtered to remove unwanted frequencies. Frequencies kept depend on the application. Most speech occurs in the 50hz to 10khz range, so other frequencies outside of this range can be removed by a band-pass filter. An audio signal typically contains from 20hz (low rumble of an elephant) to 20khz (the highest squeak we can hear). Li & Drew

Audio Filtering • Prior to sampling and AD conversion, the audio signal is also usually filtered to remove unwanted frequencies. The frequencies kept depend on the application: (a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens out lower and higher frequencies. (b) An audio music signal will typically contain from about 20Hz up to 20kHz. (c) At the DA converter end, high frequencies may reappear in the output — because of sampling and quantization. (d) So at the decoder side, a lowpass filter is used after the DA circuit. Later we’ll see how to create filters in matlab and apply them to audio signals! At the other end when we convert back from digital to an analog signal high frequencies may reappear in the output because of noise to do with sampling and quantization. So at the decoder side a low pass filter is used to remove high frequencies. Li & Drew

Audio Quality vs. Data Rate
• The uncompressed data rate increases as more bits are used for quantization. Stereo: double the bandwidth. to transmit a digital audio signal. Table 6.2: Data rate and bandwidth in sample audio applications Quality Sample Rate (Khz) Bits per Sample Mono / Stereo Data Rate (uncompressed) (kB/sec) Frequency Band (KHz) Telephone 8 Mono AM Radio 11.025 11.0 FM Radio 22.05 16 Stereo 88.2 CD 44.1 176.4 DAT 48 192.0 DVD Audio 192 (max) 24(max) 6 channels 1,200 (max) 0-96 (max) This chart shows the data rates for different kinds of audio applications. Who can tell me what bits per sample indicates? -- Quantization 2^# bits is num of quantization levels. Obv the data rate will increase as you use more bits for quantization (that’s shown here as the bits per sample). Stereo doubles the amount of bandwidth since you’re encoding two channels of sound. The sample rate tells you at what rate you sample the signal. For example for telephone, you might sample at 8Khz, with 8 bits per sample in mono. So you can calculate the data rate for telephone as: 8 (Khz sample rate) *8 (bits per sample) / 8 (bits per byte) = 8 kB, Stereo is twice rate of mono. So for example for FM radio you have: 22 (Khz sample rate) *16 (bits per sample) * 2 (for stereo) /8 (bits per byte) = 88 kB. Li & Drew

Creating Sounds Two different approaches FM WaveTable
Of course besides digitizing sound, we might also want to create sounds. We’ll talk about two fundamentally different approaches for creating synthetic sounds: FM and wavetable. Li & Drew

Signals can be decomposed into a weighted sum of sinusoids:
What does it mean that a sound signal can be decomposed into a weighted sum of sinusoids? So, how might we build a complex sound signal? Add a bunch of sine waves together. So for example how could you build an C-E-G? Make the sine wave for C make the sine wave for E make the sine wave for G and then add them together? What does that mean for a digitized signal? Vector of numbers for C, vector for E, and vector for G, simply add them together! Signals can be decomposed into a weighted sum of sinusoids: Building up a complex signal by superposing sinusoids Li & Drew

Synthetic Sounds FM (Frequency Modulation): one approach to generating synthetic sound: A(t) specifies overall loudness over time specifies the carrier frequency specifies the modulating frequency I(t) produces a feeling of harmonics (overtones) by changing the amount of the modulating frequency heard. Specify time-shifts for a more interesting sound FM synthesis is basically just a slightly more complicated version of this. In FM synthesis we have a carrier wave, Wc, that is changed by adding another term involving a second modulating frequency. A more interesting sound can be created by putting the second cosine within itself or a cosine of a cosine. A time varying amplitude envelope function A(t) multiplies the whole signal to specify loudness over time. A time varying function multiplies the inner cosine to account for overtones/harmonics Adding some time-shifts allows for even more complexity. All of this allows us to create a most complicated signal. Li & Drew

π π π π π Here we show the a carrier wave in a, the modulating frequency in b, taking the cos(cos(modulating freq) to make a more complex sound, and the final combined signal in d. So here we have seen that using FM (frequency modulation) we can construct a pretty complex looking wave (in d) by some simple combinations in our FM function. Fig. 6.7: Frequency Modulation. (a): A single frequency. (b): Twice the frequency. (c): Usually, FM is carried out using a sinusoid argument to a sinusoid. (d): A more complex form arises from a carrier frequency, 2πt and a modulating frequency 4πt cosine inside the sinusoid. Li & Drew

2. Wave Table synthesis: A more accurate way of generating sounds from digital signals. Also known, simply, as sampling. In this technique, the actual digital samples of sounds from real instruments are stored. Since wave tables are stored in memory on the sound card, they can be manipulated by software so that sounds can be combined, edited, and enhanced. A more accurate way of generating sounds is known as wave table synthesis or sampling – very simple. Here you actually store digital samples of sounds from real instruments. These samples can then be manipulated by software so that sounds can be combined, edited and enhanced. Li & Drew

6.2 MIDI: Musical Instrument Digital Interface
• MIDI Overview (a) MIDI is a protocol adopted by the electronic music industry in the early 80s for controlling devices, such as synthesizers and sound cards, that produce music and allowing them to communicate with each other. (b) MIDI is a scripting language — it codes “events” that stand for the production of sounds. E.g., a MIDI event might include values for the pitch of a single note, its duration, and its volume. You’ve probably heard the term MIDI before. This stands for musical instrument digital interface. It’s basically a protocol adapted in the early 80s for controlling devices like synthesizers or sound cards. A synthesizer is a stand-alone sound generator that can vary things like pitch and loudness. It can also change additional music characteristics such as attack and delay time of a note. MIDI is a scripting language that codes events. For example a midi event might be for an instrument to play a single note at a particular pitch at a particular volume for a set duration. Given the MIDI event as input the synthesizer would then generate that sound. Li & Drew

(c) The MIDI standard is supported by most synthesizers, so sounds created on one synthesizer can be played and manipulated on another synthesizer and sound reasonably close. (d) Computers must have a special MIDI interface, but this is incorporated into most sound cards. (e) A MIDI file consists of a sequence of MIDI instructions (messages). So, would be quite small in comparison to a standard audio file. MIDI is a standard that is supported by most synthesizers Computers need a special midi interface, but it’s generally incorporated in most sound cards. A midi file consists of just a sequence of instructions (kind of like code for the sound to be generated). So this would be quite small in comparison to a standard audio file. Li & Drew

MIDI Concepts • MIDI channels are used to separate messages.
(a) There are 16 channels numbered from 0 to 15. The channel forms the last 4 bits (the least significant bits) of the message. (b) Usually a channel is associated with a particular instrument: e.g., channel 1 is the piano, channel 10 is the drums, etc. (c) Nevertheless, one can switch instruments midstream, if desired, and associate another instrument with any channel. MIDI channels are used to separate messages. There are 16 channels (and the last 4 bits of the midi message is used to denote the channel). why 4 bits? 2^4 encodes 16 different channels. Each channel is associated with an instrument – tells the computer which instrument to use to play the note, etc. Usually each channel is associated with a particular instrument though you can switch the association of channels to instruments on the fly – so it’s kind of programable. Li & Drew

• MIDI messages (a) For most instruments, a typical message might be a Note On message (meaning, e.g., a keypress and release), consisting of what channel, what pitch, and what “velocity” (i.e., volume). (b) For percussion instruments, however, the pitch data means which kind of drum. For most instruments, a typical message might be a “Note On” message consisting of what channel, what pitch and what volume to play the note at For percussion instruments the pitch data indicates which kind of drum. Li & Drew

(c) The way a synthetic musical instrument responds to a MIDI message is usually by only playing messages that specify its channel. – If several messages are for its channel (say play multiple notes on the piano), then the instrument responds to all of them, provided it is multi-voice, i.e., can play more than a single note at once. The way that an instrument responds to a midi message is it just plays messages that specify its channel. What about if there are several messages for its channel? What do you think happens? If several messages specify the same channel (say multiple notes on the piano) then the instrument responds to all of them provided it is multi-voice or able to play more than a single note at once Li & Drew

• System messages Several other types of messages
e.g. a general message for all instruments indicating a change in tuning or timing. Besides these kinds of messages there are also other system messages. These can indicate things like a general message for all instruments indicating a change in tuning or timing. Li & Drew

• A MIDI device can often also change the envelope describing how the amplitude of a sound changes over time. • A model of the response of a digital instrument to a Note On message: Stages of amplitude versus time for a music note A midi device can often also change the envelope of a note, describing how the amplitude should change over time Typically a note will have an attack, followed by decay, sustained and then release. The timing for each of these can be changed by MIDI messages. Li & Drew

6.3 Quantization and Transmission of Audio
• Coding of Audio Quantization – unifrom/non-uniform b) Encoding differences in signals between the present and a past time can reduce the size of signal values into a much smaller range. Ok so we talked last class about how to go from analog to digital, how to go from analog to digital – need to sample and quantize. Reminder? Quantization can be uniform or non-uniform. What does non-uniform usually do? Why is this useful? using a non-uniform quantization allows us to encode the signal better with fewer bits. What if we want to save even more bits? Instead of encoding the signal values themselves which might vary a lot, we can encode differences in signals between the present and past time which probably vary over a much smaller range. So imagine signal is [ …… ]. Imagine encoding these numbers using bits. Well if range is let’s say [0,255] then we need to use 8 bits to encode each sample. Now imagine that each consecutive sample only varies by at most 10 (so the difference between one sample and the next is <=10) then how many bits would we need to encode differences in the signal? 4 bits 2^4 can encode 16 different values. So, that would let us encode each sample with just 4 bits. Savings! These are the basic idea that many compression algorithms are built around whether sound or images. Li & Drew

• Every compression scheme has three stages:
A. The input data is transformed to a new representation that is easier or more efficient to compress. B. We may introduce loss of information. ( e.g due to Quantization ⇒ we use a limited number of quantization levels, so noise is introduced). C. Coding. Assign a codeword to each output level. Every compression scheme has three stages: First the data is transformed into a new representation that is easier or more efficient to compress (e.g. differences instead of original values) We may introduce some loss of information. Quantization is usually the main lossy step because we use a discrete number of quantization levels and introduce quantization noise. Then we assign a codeword to each output level. Li & Drew

Pulse Code Modulation (PCM)
• The basic techniques for creating digital signals from analog signals are sampling and quantization. • Quantization consists of selecting breakpoints in magnitude, and then re-mapping any value within an interval to one of the representative output levels. PCM we’ve already seen. This is basically sampling and quantization. Where we have some set of breakpoints in amplitude and re-map any value within an interval to on of the representative output levels. Here non-uniform quantization can be used to reduce bandwidth. Li & Drew

2. Decoded signal is discontinuous.
Sample points (PCM signals) Reconstructed signal after low-pass filtering Signal decoded from sample points Here we show a signal in a, with the sample points for PCM. Now given these sample points, if we try to decode the original signal we would get a sort of staircase looking signal (basically assuming the signal is constant between sample points). To get a better reconstructed signal we can apply something like low pass filtering to remove any high frequency information introduced by our sampling and quantization. Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal and its corresponding PCM signals. (b) Decoded staircase signal. (c) Reconstructed signal after low-pass filtering. Li & Drew

Differential Coding of Audio
• Audio is often stored not in simple PCM but instead in a form that exploits differences — which are generally smaller numbers, so offer the possibility of using fewer bits to store. In addition to using quantization for reducing the bandwidth of our transmission we might use a neat trick called differential coding of audio. Here instead of transmitting the original values, we transmit the differences between one sample and the next. These differences are usually smaller numbers than the original sample values so we can use fewer bits to transmit them. The way to do this is Li & Drew

Lossless Predictive Coding
• Predictive coding: simply means transmitting differences — predict the next sample as being equal to the current sample; send not the sample itself but the difference between previous and next. (a) Predictive coding consists of finding differences, and transmitting these using a PCM system. (b) Note that differences of integers will be integers. Denote the integer input signal as the set of values fn. Then we predict values as simply the previous value, and define the error en as the difference between the actual and the predicted signal: (6.12) One example of this is lossless predictive coding. Here we basically predict that the next sample will be equal to the current sample, and instead of sending the sample itself, we send the difference between previous and next. So we predict that fn will be equal to fn-1 And just transmit the error in our prediction en=fn-fnhat Li & Drew

(c) But it is often the case that some function of a few of the previous values, fn−1, fn−2, fn−3, etc., provides a better prediction. Typically, a linear predictor function is used: (6.13) In general we would probably use a function of a few of the previous values like fn-1,fn-2,fn-3 Li & Drew

• Lossless predictive coding — As a simple example, suppose we devise a predictor for as follows: (6.14) So our predictor might be something like predict fn to be the average of the previous two values. Then what we transmit is the error between our predicted value and the true value. Li & Drew

• Let’s consider an explicit example
• Let’s consider an explicit example. Suppose we wish to code the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we’ll invent an extra signal value equal to f0 = 21, and first transmit this initial value, uncoded. (6.15) On board Li & Drew

• Let’s consider an explicit example
• Let’s consider an explicit example. Suppose we wish to code the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we’ll invent an extra signal value equal to f1 = 21, and first transmit this initial value, uncoded. (6.15) Li & Drew

• The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. How would you decode the transmitted signal? Li & Drew

• The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. How would you decode the transmitted signal? You receive transmitted the errors Then Li & Drew