Tamara Berg Advanced Multimedia

Slides:



Advertisements
Similar presentations
Signal Encoding Techniques
Advertisements

4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Part A Multimedia Production Rico Yu. Part A Multimedia Production Ch.1 Text Ch.2 Graphics Ch.3 Sound Ch.4 Animations Ch.5 Video.
CHAPTER Modulation.
Analogue to Digital Conversion (PCM and DM)
SWE 423: Multimedia Systems Chapter 3: Audio Technology (2)
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
Physical Layer CHAPTER 3. Announcements and Outline Announcements Credit Suisse – Tomorrow (9/9) Afternoon – Student Lounge 5:30 PM Information Session.
DIGITAL COMMUNICATIONS.  The modern world is dependent on digital communications.  Radio, television and telephone systems were essentially analog in.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Spatial and Temporal Data Mining
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Chapter 2 Fundamentals of Data and Signals
Integrated Circuits Design for Applications in Communications Dr. Charles Surya Department of Electronic and Information Engineering DE636  6220
EET 450 Chapter 18 – Audio. Analog Audio Sound is analog Consists of air pressure that has a variety of characteristics  Frequencies  Amplitude (loudness)
Lecture 3 Data Encoding and Signal Modulation
Chapter 2: Fundamentals of Data and Signals. 2 Objectives After reading this chapter, you should be able to: Distinguish between data and signals, and.
1 Chapter 2 Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User’s Approach.
Chapter 6 Basics of Digital Audio
IT-101 Section 001 Lecture #15 Introduction to Information Technology.
5. Multimedia Data. 2 Multimedia Data Representation  Digital Audio  Sampling/Digitisation  Compression (Details of Compression algorithms – following.
Chapter 14 Recording and Editing Sound. Getting Started FAQs: − How does audio capability enhance my PC? − How does your PC record, store, and play digital.
Chapter 4 Digital Transmission
SIMS-201 Audio Digitization. 2  Overview Chapter 12 Digital Audio Digitization of Audio Samples Quantization Reconstruction Quantization error.
Digital Communication Symbol Modulated Carrier RX Symbol Decision Binary Bytes D/A Recovered Analog Binary Bytes Symbol State Modulation A/D Analog Source.
Digital Audio Multimedia Systems (Module 1 Lesson 1)
1/21 Chapter 5 – Signal Encoding and Modulation Techniques.
Digital audio. In digital audio, the purpose of binary numbers is to express the values of samples that represent analog sound. (contrasted to MIDI binary.
COSC 3213 – Computer Networks I Summer 2003 Topics: 1. Line Coding (Digital Data, Digital Signals) 2. Digital Modulation (Digital Data, Analog Signals)
Data Communications & Computer Networks, Second Edition1 Chapter 2 Fundamentals of Data and Signals.
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
MIDI. A protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other. Instead of storing actual.
Multimedia Technology Digital Sound Krich Sintanakul Multimedia and Hypermedia Department of Computer Education KMITNB.
Chapter 6 Basics of Digital Audio
CSC361/661 Digital Media Spring 2002
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Media Representations - Audio
10/6/2015 3:12 AM1 Data Encoding ─ Analog Data, Digital Signals (5.3) CSE 3213 Fall 2011.
Signal Digitization Analog vs Digital Signals An Analog Signal A Digital Signal What type of signal do we encounter in nature?
1 CSCD 433 Network Programming Fall 2013 Lecture 4 Physical Layer Line Coding Continued.
Signal Encoding Techniques. Lecture Learning Outcomes Be able to understand, appreciate and differentiate the different signal encoding criteria available.
Multimedia Elements: Sound, Animation, and Video.
Chapter #5 Pulse Modulation
Multimedia Technology and Applications Chapter 2. Digital Audio
Chapter 15 Recording and Editing Sound. 2Practical PC 5 th Edition Chapter 15 Getting Started In this Chapter, you will learn: − How sound capability.
3-2008UP-Copyrights reserved1 ITGD4103 Data Communications and Networks Lecture-11:Data encoding techniques week 12- q-2/ 2008 Dr. Anwar Mousa University.
Basic Encoding Techniques
1 Chapter 2 Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User’s Approach.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Encoding How is information represented?. Way of looking at techniques Data Medium Digital Analog Digital Analog NRZ Manchester Differential Manchester.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Computer Networks Chapter 5 – Analog Transmission.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
XP Practical PC, 3e Chapter 14 1 Recording and Editing Sound.
IT-101 Section 001 Lecture #15 Introduction to Information Technology.
Chapter 15 Recording and Editing Sound
Analog to digital conversion
Vocoders.
Digital Communication
CHAPTER 3 Physical Layer.
Overview Communication is the transfer of information from one place to another. This should be done - as efficiently as possible - with as much fidelity/reliability.
CHAPTER 3 Physical Layer.
MODULATION AND DEMODULATION
Chapter 10. Digital Signals
Digital Audio Application of Digital Audio - Selected Examples
Presentation transcript:

Tamara Berg Advanced Multimedia Sound Analysis Tamara Berg Advanced Multimedia

Feb 28 Matlab sound lab Bring headphones!!!

Reminder HW1 due Wed, Feb 27 11:59pm Questions?? Li & Drew

Summary Sound in a computer is represented by a vector of values – discrete samples of the continuous wave. [x1 x2 x3 x4 x5 x6….] Here you have two things to decide: What two things do you have to decide to go from an analog sound wave to a digital representation? Li & Drew

Sound Wave The amplitude of a sound wave changes over time. Amplitude corresponds to pressure increasing or decreasing over time. Li & Drew

Summary Sound in a computer is represented by a vector of values – discrete samples of the continuous wave. [x1 x2 x3 x4 x5 x6….] Here you have two things to decide: How often to sample the signal (sampling rate). How to quantize your continuous sound signal into a set of discrete values. And what does uniform vs non-uniform quantization mean? At the quiet end we can hear smaller changes in the signal than at the very loud end of things. So we want to quantize the quiet parts with more bits than the loud parts. This is a non-uniform quantization. This is called a Perceptual coder - Allocates more bits to intervals for which a small change in the stimulus produces a large change in response. Li & Drew

Audio Filtering • Prior to sampling and AD conversion, the audio signal is also usually filtered to remove unwanted frequencies. The frequencies kept depend on the application: (a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens out lower and higher frequencies. (b) An audio music signal will typically contain from about 20Hz up to 20kHz. (c) At the DA converter end, high frequencies may reappear in the output — because of sampling and then quantization. (d) So at the decoder side, a lowpass filter is used after the DA circuit. Next class we’ll see how to create filters in matlab and apply them to audio signals! When we go from an analog sound signal to a digital one the audio signal is usually filtered to remove unwanted frequencies. Frequencies kept depend on the application. Most speech occurs in the 50hz to 10khz range, so other frequencies outside of this range can be removed by a band-pass filter. An audio signal typically contains from 20hz (low rumble of an elephant) to 20khz (the highest squeak we can hear). Li & Drew

Audio Filtering • Prior to sampling and AD conversion, the audio signal is also usually filtered to remove unwanted frequencies. The frequencies kept depend on the application: (a) For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens out lower and higher frequencies. (b) An audio music signal will typically contain from about 20Hz up to 20kHz. (c) At the DA converter end, high frequencies may reappear in the output — because of sampling and quantization. (d) So at the decoder side, a lowpass filter is used after the DA circuit. Later we’ll see how to create filters in matlab and apply them to audio signals! At the other end when we convert back from digital to an analog signal high frequencies may reappear in the output because of noise to do with sampling and quantization. So at the decoder side a low pass filter is used to remove high frequencies. Li & Drew

Audio Quality vs. Data Rate • The uncompressed data rate increases as more bits are used for quantization. Stereo: double the bandwidth. to transmit a digital audio signal. Table 6.2: Data rate and bandwidth in sample audio applications Quality Sample Rate (Khz) Bits per Sample Mono / Stereo Data Rate (uncompressed) (kB/sec) Frequency Band (KHz) Telephone 8 Mono 0.200-3.4 AM Radio 11.025 11.0 0.1-5.5 FM Radio 22.05 16 Stereo 88.2 0.02-11 CD 44.1 176.4 0.005-20 DAT 48 192.0 DVD Audio 192 (max) 24(max) 6 channels 1,200 (max) 0-96 (max) This chart shows the data rates for different kinds of audio applications. Who can tell me what bits per sample indicates? -- Quantization 2^# bits is num of quantization levels. Obv the data rate will increase as you use more bits for quantization (that’s shown here as the bits per sample). Stereo doubles the amount of bandwidth since you’re encoding two channels of sound. The sample rate tells you at what rate you sample the signal. For example for telephone, you might sample at 8Khz, with 8 bits per sample in mono. So you can calculate the data rate for telephone as: 8 (Khz sample rate) *8 (bits per sample) / 8 (bits per byte) = 8 kB, Stereo is twice rate of mono. So for example for FM radio you have: 22 (Khz sample rate) *16 (bits per sample) * 2 (for stereo) /8 (bits per byte) = 88 kB. Li & Drew

Creating Sounds Two different approaches FM WaveTable Of course besides digitizing sound, we might also want to create sounds. We’ll talk about two fundamentally different approaches for creating synthetic sounds: FM and wavetable. Li & Drew

Signals can be decomposed into a weighted sum of sinusoids: What does it mean that a sound signal can be decomposed into a weighted sum of sinusoids? So, how might we build a complex sound signal? Add a bunch of sine waves together. So for example how could you build an C-E-G? Make the sine wave for C make the sine wave for E make the sine wave for G and then add them together? What does that mean for a digitized signal? Vector of numbers for C, vector for E, and vector for G, simply add them together! Signals can be decomposed into a weighted sum of sinusoids: Building up a complex signal by superposing sinusoids Li & Drew

Synthetic Sounds FM (Frequency Modulation): one approach to generating synthetic sound: A(t) specifies overall loudness over time specifies the carrier frequency specifies the modulating frequency I(t) produces a feeling of harmonics (overtones) by changing the amount of the modulating frequency heard. Specify time-shifts for a more interesting sound FM synthesis is basically just a slightly more complicated version of this. In FM synthesis we have a carrier wave, Wc, that is changed by adding another term involving a second modulating frequency. A more interesting sound can be created by putting the second cosine within itself or a cosine of a cosine. A time varying amplitude envelope function A(t) multiplies the whole signal to specify loudness over time. A time varying function multiplies the inner cosine to account for overtones/harmonics Adding some time-shifts allows for even more complexity. All of this allows us to create a most complicated signal. Li & Drew

π π π π π Here we show the a carrier wave in a, the modulating frequency in b, taking the cos(cos(modulating freq) to make a more complex sound, and the final combined signal in d. So here we have seen that using FM (frequency modulation) we can construct a pretty complex looking wave (in d) by some simple combinations in our FM function. Fig. 6.7: Frequency Modulation. (a): A single frequency. (b): Twice the frequency. (c): Usually, FM is carried out using a sinusoid argument to a sinusoid. (d): A more complex form arises from a carrier frequency, 2πt and a modulating frequency 4πt cosine inside the sinusoid. Li & Drew

2. Wave Table synthesis: A more accurate way of generating sounds from digital signals. Also known, simply, as sampling. In this technique, the actual digital samples of sounds from real instruments are stored. Since wave tables are stored in memory on the sound card, they can be manipulated by software so that sounds can be combined, edited, and enhanced. A more accurate way of generating sounds is known as wave table synthesis or sampling – very simple. Here you actually store digital samples of sounds from real instruments. These samples can then be manipulated by software so that sounds can be combined, edited and enhanced. Li & Drew

6.2 MIDI: Musical Instrument Digital Interface • MIDI Overview (a) MIDI is a protocol adopted by the electronic music industry in the early 80s for controlling devices, such as synthesizers and sound cards, that produce music and allowing them to communicate with each other. (b) MIDI is a scripting language — it codes “events” that stand for the production of sounds. E.g., a MIDI event might include values for the pitch of a single note, its duration, and its volume. You’ve probably heard the term MIDI before. This stands for musical instrument digital interface. It’s basically a protocol adapted in the early 80s for controlling devices like synthesizers or sound cards. A synthesizer is a stand-alone sound generator that can vary things like pitch and loudness. It can also change additional music characteristics such as attack and delay time of a note. MIDI is a scripting language that codes events. For example a midi event might be for an instrument to play a single note at a particular pitch at a particular volume for a set duration. Given the MIDI event as input the synthesizer would then generate that sound. Li & Drew

(c) The MIDI standard is supported by most synthesizers, so sounds created on one synthesizer can be played and manipulated on another synthesizer and sound reasonably close. (d) Computers must have a special MIDI interface, but this is incorporated into most sound cards. (e) A MIDI file consists of a sequence of MIDI instructions (messages). So, would be quite small in comparison to a standard audio file. MIDI is a standard that is supported by most synthesizers Computers need a special midi interface, but it’s generally incorporated in most sound cards. A midi file consists of just a sequence of instructions (kind of like code for the sound to be generated). So this would be quite small in comparison to a standard audio file. Li & Drew

MIDI Concepts • MIDI channels are used to separate messages. (a) There are 16 channels numbered from 0 to 15. The channel forms the last 4 bits (the least significant bits) of the message. (b) Usually a channel is associated with a particular instrument: e.g., channel 1 is the piano, channel 10 is the drums, etc. (c) Nevertheless, one can switch instruments midstream, if desired, and associate another instrument with any channel. MIDI channels are used to separate messages. There are 16 channels (and the last 4 bits of the midi message is used to denote the channel). why 4 bits? 2^4 encodes 16 different channels. Each channel is associated with an instrument – tells the computer which instrument to use to play the note, etc. Usually each channel is associated with a particular instrument though you can switch the association of channels to instruments on the fly – so it’s kind of programable. Li & Drew

• MIDI messages (a) For most instruments, a typical message might be a Note On message (meaning, e.g., a keypress and release), consisting of what channel, what pitch, and what “velocity” (i.e., volume). (b) For percussion instruments, however, the pitch data means which kind of drum. For most instruments, a typical message might be a “Note On” message consisting of what channel, what pitch and what volume to play the note at For percussion instruments the pitch data indicates which kind of drum. Li & Drew

(c) The way a synthetic musical instrument responds to a MIDI message is usually by only playing messages that specify its channel. – If several messages are for its channel (say play multiple notes on the piano), then the instrument responds to all of them, provided it is multi-voice, i.e., can play more than a single note at once. The way that an instrument responds to a midi message is it just plays messages that specify its channel. What about if there are several messages for its channel? What do you think happens? If several messages specify the same channel (say multiple notes on the piano) then the instrument responds to all of them provided it is multi-voice or able to play more than a single note at once Li & Drew

• System messages Several other types of messages e.g. a general message for all instruments indicating a change in tuning or timing. Besides these kinds of messages there are also other system messages. These can indicate things like a general message for all instruments indicating a change in tuning or timing. Li & Drew

• A MIDI device can often also change the envelope describing how the amplitude of a sound changes over time. • A model of the response of a digital instrument to a Note On message: Stages of amplitude versus time for a music note A midi device can often also change the envelope of a note, describing how the amplitude should change over time Typically a note will have an attack, followed by decay, sustained and then release. The timing for each of these can be changed by MIDI messages. Li & Drew

6.3 Quantization and Transmission of Audio • Coding of Audio Quantization – unifrom/non-uniform b) Encoding differences in signals between the present and a past time can reduce the size of signal values into a much smaller range. Ok so we talked last class about how to go from analog to digital, how to go from analog to digital – need to sample and quantize. Reminder? Quantization can be uniform or non-uniform. What does non-uniform usually do? Why is this useful? using a non-uniform quantization allows us to encode the signal better with fewer bits. What if we want to save even more bits? Instead of encoding the signal values themselves which might vary a lot, we can encode differences in signals between the present and past time which probably vary over a much smaller range. So imagine signal is [1 3 4 5 3 …… 245 246 247 246 245 243]. Imagine encoding these numbers using bits. Well if range is let’s say [0,255] then we need to use 8 bits to encode each sample. Now imagine that each consecutive sample only varies by at most 10 (so the difference between one sample and the next is <=10) then how many bits would we need to encode differences in the signal? 4 bits 2^4 can encode 16 different values. So, that would let us encode each sample with just 4 bits. Savings! These are the basic idea that many compression algorithms are built around whether sound or images. Li & Drew

• Every compression scheme has three stages: A. The input data is transformed to a new representation that is easier or more efficient to compress. B. We may introduce loss of information. ( e.g due to Quantization ⇒ we use a limited number of quantization levels, so noise is introduced). C. Coding. Assign a codeword to each output level. Every compression scheme has three stages: First the data is transformed into a new representation that is easier or more efficient to compress (e.g. differences instead of original values) We may introduce some loss of information. Quantization is usually the main lossy step because we use a discrete number of quantization levels and introduce quantization noise. Then we assign a codeword to each output level. Li & Drew

Pulse Code Modulation (PCM) • The basic techniques for creating digital signals from analog signals are sampling and quantization. • Quantization consists of selecting breakpoints in magnitude, and then re-mapping any value within an interval to one of the representative output levels. PCM we’ve already seen. This is basically sampling and quantization. Where we have some set of breakpoints in amplitude and re-map any value within an interval to on of the representative output levels. Here non-uniform quantization can be used to reduce bandwidth. Li & Drew

2. Decoded signal is discontinuous. Sample points (PCM signals) Reconstructed signal after low-pass filtering Signal decoded from sample points Here we show a signal in a, with the sample points for PCM. Now given these sample points, if we try to decode the original signal we would get a sort of staircase looking signal (basically assuming the signal is constant between sample points). To get a better reconstructed signal we can apply something like low pass filtering to remove any high frequency information introduced by our sampling and quantization. Fig. 6.13: Pulse Code Modulation (PCM). (a) Original analog signal and its corresponding PCM signals. (b) Decoded staircase signal. (c) Reconstructed signal after low-pass filtering. Li & Drew

Differential Coding of Audio • Audio is often stored not in simple PCM but instead in a form that exploits differences — which are generally smaller numbers, so offer the possibility of using fewer bits to store. In addition to using quantization for reducing the bandwidth of our transmission we might use a neat trick called differential coding of audio. Here instead of transmitting the original values, we transmit the differences between one sample and the next. These differences are usually smaller numbers than the original sample values so we can use fewer bits to transmit them. The way to do this is Li & Drew

Lossless Predictive Coding • Predictive coding: simply means transmitting differences — predict the next sample as being equal to the current sample; send not the sample itself but the difference between previous and next. (a) Predictive coding consists of finding differences, and transmitting these using a PCM system. (b) Note that differences of integers will be integers. Denote the integer input signal as the set of values fn. Then we predict values as simply the previous value, and define the error en as the difference between the actual and the predicted signal: (6.12) One example of this is lossless predictive coding. Here we basically predict that the next sample will be equal to the current sample, and instead of sending the sample itself, we send the difference between previous and next. So we predict that fn will be equal to fn-1 And just transmit the error in our prediction en=fn-fnhat Li & Drew

(c) But it is often the case that some function of a few of the previous values, fn−1, fn−2, fn−3, etc., provides a better prediction. Typically, a linear predictor function is used: (6.13) In general we would probably use a function of a few of the previous values like fn-1,fn-2,fn-3 Li & Drew

• Lossless predictive coding — As a simple example, suppose we devise a predictor for as follows: (6.14) So our predictor might be something like predict fn to be the average of the previous two values. Then what we transmit is the error between our predicted value and the true value. Li & Drew

• Let’s consider an explicit example • Let’s consider an explicit example. Suppose we wish to code the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we’ll invent an extra signal value equal to f0 = 21, and first transmit this initial value, uncoded. (6.15) On board Li & Drew

• Let’s consider an explicit example • Let’s consider an explicit example. Suppose we wish to code the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22. For the purposes of the predictor, we’ll invent an extra signal value equal to f1 = 21, and first transmit this initial value, uncoded. (6.15) Li & Drew

• The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. How would you decode the transmitted signal? Li & Drew

• The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient. How would you decode the transmitted signal? You receive transmitted the errors Then Li & Drew