Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Multimedia1 ICS 218 -Multimedia Systems and Applications Lecture 2 - Audio/Image/Video Representation Prof. Nalini Venkatasubramanian

Similar presentations


Presentation on theme: "Introduction to Multimedia1 ICS 218 -Multimedia Systems and Applications Lecture 2 - Audio/Image/Video Representation Prof. Nalini Venkatasubramanian"— Presentation transcript:

1 Introduction to Multimedia1 ICS 218 -Multimedia Systems and Applications Lecture 2 - Audio/Image/Video Representation Prof. Nalini Venkatasubramanian nalini@ics.uci.edu

2 Introduction to Multimedia2 Introduction zBasic Sound Concepts zComputer Representation of Sound zBasic Image Concepts zImage Representation and Formats zVideo Signal Representation zColor Encoding zComputer Video Format

3 Introduction to Multimedia3 Basic Sound Concepts zAcoustics xstudy of sound - generation, transmission and reception of sound waves. zSound is produced by vibration of matter. xDuring vibration, pressure variations are created in the surrounding air molecules. xPattern of oscillation creates a waveform the wave is made up of pressure differences. xWaveform repeats the same shape at intervals called a period. Periodic sound sources - exhibit more periodicity, more musical - musical instruments, wind etc. Aperiodic sound sources - less periodic - unpitched percussion, sneeze, cough.

4 Introduction to Multimedia4 Basic Sound Concepts zSound Transmission xSound is transmitted by molecules bumping into each other. xSound is a continuous wave that travels through air. ySound is detected by measuring the pressure level at a point. yReceiving xMicrophone in sound field moves according to the varying pressure exerted on it. xTransducer converts energy into a voltage level (i.e. energy of another form - electrical energy) ySending xSpeaker transforms electrical energy into sound waves.

5 Introduction to Multimedia5 Frequency of a sound wave period amplitude time Air pressure Frequency is the reciprocal value of the period.

6 Introduction to Multimedia6 Basic Sound Concepts yWavelength is the distance travelled in one cycle x20Hz is 56 feet, 20KHz is 0.7 in. yFrequency represents the number of periods in a second (measured in hertz, cycles/second). xFrequency is the reciprocal value of the period. xHuman hearing frequency range: 20Hz - 20Khz, voice is about 500Hz to 2Khz. Infrasound from 0 - 20 Hz Human range from 20Hz - 20KHz Ultrasound from 20kHz - 1GHz Hypersound from 1GHz - 10THz

7 Introduction to Multimedia7 Basic Sound Concepts yAmplitude of a sound is the measure of the displacement of the air pressure wave from its mean or quiescent state. ySubjectively heard as loudness. Measured in decibels. 0 db - essentially no sound heard 35 db - quiet home 70 db - noisy street 120db - discomfort

8 Introduction to Multimedia8 Computer Representation of Audio yA transducer converts pressure to voltage levels. yConvert analog signal into a digital stream by discrete sampling. xDiscretization both in time and amplitude (quantization). yIn a computer, we sample these values at intervals to get a vector of values. yA computer measures the amplitude of the waveform at regular time intervals to produce a series of numbers (samples).

9 Introduction to Multimedia9 Computer Representation of Audio ySampling Rate: xrate at which a continuous wave is sampled (measured in Hertz) CD standard - 44100 Hz, Telephone quality - 8000 Hz. xDirect relationship between sampling rate, sound quality (fidelity) and storage space. xQuestion How often do you need to sample a signal to avoid losing information? xAnswer To decide a sampling rate - must be aware of difference between playback rate and capturing(sampling) rate. It depends on how fast the signal is changing. In reality - twice per cycle (follows from the Nyquist sampling theorem).

10 Introduction to Multimedia10 Sampling samples Sample Height

11 Introduction to Multimedia11 Nyquist Sampling Theorem yIf a signal f(t) is sampled at regular intervals of time and at a rate higher than twice the highest significant signal frequency, then the samples contain all the information of the original signal. yExample xActual playback frequency for CD quality audio is 22050 Hz xBecause of Nyquist Theorem - we need to sample the signal twice, therefore sampling frequency is 44100 Hz.

12 Introduction to Multimedia12 Data Rate of a Channel yNoiseless Channel Nyquist proved that if any arbitrary signal has been run through a low pass filter of bandwidth H, the filtered signal can be completely reconstructed by making only 2H (exact) samples per second. If the signal consists of V discrete levels, Nyquist’s theorem states: max datarate = 2 *H log_2 V bits/sec noiseless 3kHz channel with quantization level 1 bit cannot transmit binary signal at a rate exceeding 6000 bits per second. yNoisy Channel Thermal noise present is measured by the ratio of the signal power S to the noise power N (signal-to-noise ratio S/N). Max datarate - H log_2 (1+S/N)

13 Introduction to Multimedia13 Quantization ySample precision - the resolution of a sample value yQuantization depends on the number of bits used measuring the height of the waveform. y16 bit CD quality quantization results in 64K values. yAudio formats are described by sample rate and quantization. Voice quality - 8 bit quantization, 8000 Hz mono(8 Kbytes/sec) 22kHz 8-bit mono (22kBytes/s) and stereo (44Kbytes/sec) CD quality - 16 bit quantization, 44100 Hz linear stereo (196 Kbytes/s)

14 Introduction to Multimedia14 Quantization and Sampling samples Sample Height 0.75 0.5 0.25

15 Introduction to Multimedia15 Audio Formats yAudio formats are characterized by four parameters xSample rate: Sampling frequency xEncoding: audio data representation  -law encoding corresponds to CCITT G.711 - standard for voice data in telephone companies in USA, Canada, Japan A-law encoding - used for telephony elsewhere. A-law and  -law are sampled at 8000 samples/second with precision of 12bits, compressed to 8-bit samples. Linear Pulse Code Modulation(PCM) - uncompressed audio where samples are proportional to audio signal voltage. xPrecision: number of bits used to store audio sample  -law and A-law - 8 bit precision, PCM can be stored at various precisions, 16 bit PCM is common. xChannel: Multiple channels of audio may be interleaved at sample boundaries.

16 Introduction to Multimedia16 Audio Formats zAvailable on UNIX yau (SUN file format), wav (Microsoft RIFF/waveform format), al (raw a-law), u (raw u-law)… zAvailable on Windows-based systems (RIFF formats) ywav, midi (file format for standard MIDI files), avi zRIFF (Resource Interchange File Format) ytagged file format (similar to TIFF).. Allows multiple applications to read files in RIFF format zRealAudio, MP3 (MPEG Audio Layer 3)

17 Introduction to Multimedia17 Computer Representation of Voice zBest known technique for voice digitization is pulse-code-modulation (PCM). yConsists of the 2 step process of sampling and quantization. yBased on the sampling theorem. xIf voice data are limited to 4000Hz, then PCM samples 8000 samples per second which is sufficient for input voice signal. yPCM provides analog samples which must be converted to digital representation. xEach of these analog samples must be assigned a binary code. Each sample is approximated by being quantized.

18 Introduction to Multimedia18 Computer Representation of Music yMIDI (Music Instrument Digital Interface) xstandard that manufacturers of musical instruments use so that instruments can communicate musical information via computers. xThe MIDI interface consists of: Hardware - physical connection b/w instruments, specifies a MIDI port (plugs into computers serial port) and a MIDI cable. Data format - has instrument specification, notion of beginning and end of note, frequency and sound volume. Data grouped into MIDI messages that specify a musical event. An instrument that satisfies both is a MIDI device (e.g. synthesizer) xMIDI software applications include music recording and performance applications, musical notations and printing applications, music education etc.

19 Introduction to Multimedia19 Computer Representation of Speech xHuman ear is most sensitive in the range 600Hz to 6000 Hz. xSpeech Generation real-time signal generation allows transformation of text into speech without lengthy processing Limited vs. large vocabulary (depends on application) Must be understandable, must sound natural xSpeech Analysis Identification and Verification - recognize speakers using acoustic fingerprint Recognition and Understanding - analyze what has been said How something was said - used in lie detectors. xSpeech transmission - coding, recognition and synthesis methods - achieve minimal data rate for a given quality.

20 Introduction to Multimedia20 Basic Concepts (Digital Image Representation) yAn image is a spatial representation of an object, a 2D or 3D scene etc. yAbstractly, an image is a continuous function defining a rectangular region of a plane xintensity image - proportional to radiant energy received by a sensor/detector xrange image - line of sight distance from sensor position. yAn image can be thought of as a function with resulting values of the light intensity at each point over a planar region.

21 Introduction to Multimedia21 Digital Image Representation yFor computer representation, function (e.g. intensity) must be sampled at discrete intervals. xSampling quantizes the intensity values into discrete intervals. Points at which an image is sampled are called picture elements or pixels. Resolution specifies the distance between points - accuracy. xA digital image is represented by a matrix of numeric values each representing a quantized intensity value. I(r,c) - intensity value at position corresponding to row r and column c of the matrix. Intensity value can be represented by bits for black and white images (binary valued images), 8 bits for monochrome imagery to encode color or grayscale levels, 24 bit (color-RGB).

22 Introduction to Multimedia22 Image Formats yCaptured Image Format xformat obtained from an image frame grabber xImportant parameters Spatial resolution (pixels X pixels) Color encoding (quantization level of a pixel - 8-bit, 24-bit) e.g. “SunVideo” Video digitizer board allows pictures of 320 by 240 pixels with 8-bit grayscale or color resolution. Parallax-X video includes resolution of 640X480 pixels and 24-bit frame buffer.

23 Introduction to Multimedia23 Image Formats yStored Image Format - format when images are stored yImages are stored as 2D array of values where each value represents the data associated with a pixel in the image. xBitmap - this value is a binary digit xFor a color image - this value may be a collection of 3 values that represent intensities of RGB component at that pixel, 3 numbers that are indices to table of RGB intensities, index to some color data structure etc. yImage file formats include - GIF (Graphical Interchange Format), X11 bitmap, Postscript, JPEG, TIFF

24 Introduction to Multimedia24 Image Formats yGraphics Format - xspecifies graphics images through graphics primitives and attributes. Graphics primitives - line, rectangle, circles, ellipses, specifications of 2D and 3D objects Graphics attributes - line style, line width, color xGraphics formats represent a higher level of image representation, i.e., they are not represented by a pixel matrix initially. Advantage - less storage space per graphical image Disadvantage - more overhead during display time; must convert from a graphical image to the image format which may be a bitmap or pixmap. E.g PHIGS (programmer’s hierarchical interactive graphics system), GKS (graphical kernel system).

25 Introduction to Multimedia25 Basic Concepts (Video Representation) yHuman eye views video ximmanent properties of the eye determine essential conditions related to video systems. yVideo signal representation consists of 3 aspects: xVisual Representation objective is to offer the viewer a sense of presence in the scene and of participation in the events portrayed. xTransmission Video signals are transmitted to the receiver through a single television channel xDigitalization analog to digital conversion, sampling of gray(color) level, quantization.

26 Introduction to Multimedia26 Visual Representation yThe televised image should convey the spatial and temporal content of the scene xVertical detail and viewing distance Aspect ratio: ratio of picture width and height (4/3 = 1.33 is the conventional aspect ratio). Viewing angle = viewing distance/picture height xHorizontal detail and picture width Picture width (conventional TV service ) - 4/3 * picture height xTotal detail content of the image Number of pixels presented separately in the picture height = vertical resolution Number of pixels in the picture width = horizontal resolution*aspect ratio product equals total number of picture elements in the image.

27 Introduction to Multimedia27 Visual Representation xPerception of Depth In natural vision, this is determined by angular separation of images received by the two eyes of the viewer In the flat image of TV, focal length of lenses and changes in depth of focus in a camera influence depth perception. xLuminance and Chrominance Color-vision - achieved through 3 signals, proportional to the relative intensities of RED, GREEN and BLUE. Color encoding during transmission uses one LUMINANCE and two CHROMINANCE signals xTemporal Aspect of Resolution Motion resolution is a rapid succession of slightly different frames. For visual reality, repetition rate must be high enough (a) to guarantee smooth motion and (b) persistance of vision extends over interval between flashes(light cutoff b/w frames).

28 Introduction to Multimedia28 Visual Representation xContinuity of motion Motion continuity is achieved at a minimal 15 frames per second; is good at 30 frames/sec; some technologies allow 60 frames/sec. NTSC standard provides 30 frames/sec - 29.97 Hz repetition rate. PAL standard provides 25 frames/sec with 25Hz repetition rate. xFlicker effect Flicker effect is a periodic fluctuation of brightness perception. To avoid this effect, we need 50 refresh cycles/sec. Display devices have a display refresh buffer for this. xTemporal aspect of video bandwidth depends on rate of the visual system to scan pixels and on human eye scanning capabilities.

29 Introduction to Multimedia29 Transmission (NTSC) yVideo bandwidth is computed as follows x700/2 pixels per line X 525 lines per picture X 30 pictures per second xVisible number of lines is 480. yIntermediate delay between frames is x1000ms/30fps = 33.3ms yDisplay time per line is x33.3ms/525 lines = 63.4 microseconds yThe transmitted signal is a composite signal xconsists of 4.2Mhz for the basic signal and 5Mhz for the color, intensity and synchronization information.

30 Introduction to Multimedia30 Color Encoding yA camera creates three signals xRGB (red, green and blue) yFor transmission of the visual signal, we use three signals 1 luminance (brightness-basic signal) and 2 chrominance (color signals). xIn NTSC, luminance and chrominance are interleaved xGoal at receiver separate luminance from chrominance components avoid interference between them prior to recovery of primary color signals for display.

31 Introduction to Multimedia31 Color Encoding yRGB signal - for separate signal coding xconsists of 3 separate signals for red, green and blue colors. Other colors are coded as a combination of primary color. (R+G+B = 1) --> neutral white color. yYUV signal xseparate brightness (luminance) component Y and xcolor information (2 chrominance signals U and V) Y = 0.3R + 0.59G + 0.11B U = (B-Y) * 0.493 V = (R-Y) * 0.877 xResolution of the luminance component is more important than U,V xCoding ratio of Y, U, V is 4:2:2

32 Introduction to Multimedia32 Color Encoding(cont.) yYIQ signal xsimilar to YUV - used by NTSC format Y = 0.3R + 0.59G + 0.11B U = 0.60R - 0.28G + 0.32 B V = 0.21R -0.52g + 0.31B yComposite signal xAll information is composed into one signal xTo decode, need modulation methods for eliminating interference b/w luminance and chrominance components.

33 Introduction to Multimedia33 Digitalization yRefers to sampling the gray/color level in the picture at MXN array of points. yOnce points are sampled, they are quantized into pixels sampled value is mapped into an integer quantization level is dependent on number of bits used to represent resulting integer, e.g. 8 bits per pixel or 24 bits per pixel. yNeed to create motion when digitizing video xdigitize pictures in time xobtain sequence of digital images per second to approximate analog motion video.

34 Introduction to Multimedia34 Computer Video Format yVideo Digitizer xA/D converter yImportant parameters resulting from a digitizer digital image resolution quantization frame rate xE.g. Parallax X Video - camera takes the NTSC signal and the video board digitizes it. Resulting video has 640X480 pixels spatial resolution 24 bits per pixel resolution 20fps (lower image resolution - more fps) xOutput of digital video goes to raster displays with large video RAM memories. Color lookup table used for presentation of color

35 Introduction to Multimedia35 Digital Transmission Bandwidth yBandwidth requirement for images xraw image transmission b/w = size of image = spatial resolution x pixel resolution xcompressed image - depends on compression scheme xsymbolic image transmission b/w = size of instructions and primitives carrying graphics variables yBandwidth requirement for video xuncompressed video = image size X frame rate xcompressed video - depends on compression scheme xe.g HDTV quality video uncompressed - 345.6Mbps, compressed using MPEG (34 Mbps with some loss of quality).


Download ppt "Introduction to Multimedia1 ICS 218 -Multimedia Systems and Applications Lecture 2 - Audio/Image/Video Representation Prof. Nalini Venkatasubramanian"

Similar presentations


Ads by Google