Fundamentals of Multimedia

Fundamentals of Multimedia
2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Chapter 1 : Introduction and Multimedia Data Representations

This chapter considers what multimedia is.
It also supplies an overview of multimedia software tools, such as video editors and digital audio programs

1.1 The term “multimedia “. applications that use multiple modalities, including text, images, drawings (graphics), animation, video, sound including speech, and interactivity.

1.1 Multimedia and Computer Science
Graphics, HCI, visualization, computer vision, data compression, graph theory, networking, database systems --- all have important contributions to make in multimedia at the present time.

Components of Multimedia

Multimedia involves multiple modalities of text, audio, images, drawings, animation, and video.
Examples of how these modalities are put to use: 1. Video teleconferencing. 2. Distributed lectures for higher education. 3. Tele-medicine. 4. Co-operative work environments.

5. Searching in (very) large video and image databases for target visual objects. 6. " Augmented" reality: placing real- appearing computer graphics and video objects into scenes. 7. Including audio cues for where video- conference participants are located. 8. Building searchable features into new video, and enabling very high- to very low-bit-rate use of new, scalable multimedia products.

9. Making multimedia components editable. 10
9. Making multimedia components editable. 10. Building "inverse-Hollywood" applications that can recreate the process by which a video was made. 11. Using voice-recognition to build an interactive environment, say a kitchen-wall web browser.

1.2 Multimedia and Hypermedia

History of Multimedia:
Newspaper: perhaps the first mass communication medium, uses text, graphics, and images. Motion pictures: conceived of in 1830's in order to observe motion too rapid for perception by the human eye. Wireless radio transmission: Guglielmo Marconi, at Pontecchio, Italy, in 1895. Television: the new medium for the 20th century, established video as a commonly available medium and has since changed the world of mass communications.

History of Multimedia:
5. The connection between computers and ideas about multimedia covers what is actually only a short period: Vannevar Bush wrote a landmark article describing what amounts to a hypermedia system called Memex. Ted Nelson coined the term hypertext. WWW size was estimated at over 1 billion pages.

Hypermedia and Multimedia
A hypertext system: meant to be read nonlinearly, by following links that point to other parts of the document, or to other documents HyperMedia: not constrained to be text-based, can include other media, e.g., graphics, images, and especially the continuous media | sound and video. The World Wide Web (WWW) | the best example of a hypermedia application. Multimedia means that computer information can be represented through audio, graphics, images, video, and animation in addition to traditional media.

SMIL (Synchronized Multimedia Integration Language)
SMIL: pronounced "smile“ -- a particular application of XML (globally predefined DTD) that allows for specification of interaction among any media types and user input, in a temporally scripted manner.

SMIL Purpose of SMIL: it is also desirable to be able to publish multimedia presentations using a markup language. A multimedia markup language needs to enable scheduling and synchronization of different multimedia elements, and define their interactivity with the user. SMIL 2.0 is specified in XML using a modularization approach similar to the one used in xhtml:

SMIL Basic elements of SMIL as shown in the following example:
<!DOCTYPE smil PUBLIC "-//W3C//DTD SMIL 2.0" " <smil xlmns=" <head> <meta name="Author" content="Some Professor" /> </head> <body> <par id="MakingOfABook"> <seq> <video src="authorview.mpg" /> <img src="onagoodday.jpg" /> </seq> <audio src="authorview.wav" /> <text src=" /> </par> </body> </smil>

1.3 Overview of Multimedia Software Tools
software tools available for carrying out tasks in multimedia are: 1. Music Sequencing and Notation 2. Digital Audio 3. Graphics and Image Editing 4. Video Editing 5. Animation 6. Multimedia Authoring

1.Music Sequencing and Notation
Cakewalk: now called Pro Audio. - The term sequencer comes from older devices that stored sequences of notes ("events", in MIDI [Musical Instrument Digital Interface]). It is also possible to insert WAV files and Windows MCI commands (for animation and video) into music tracks (MCI is a ubiquitous component of the Windows API.) Cubase: another sequencing/editing program, with capabilities similar to those of Cakewalk. It includes some digital audio editing tools. Macromedia Soundedit: mature program for creating audio for multimedia projects and the web that integrates well with other Macromedia products such as Flash and Director.

2.Digital Audio tools deal with accessing and editing the actual sampled sounds that make up audio: Adobe Audition (formerly Cool Edit) is a powerful, popular digital audio toolkit that emulate a professional audio studio, including multitrack productions and sound file editing, along with digital signal processing effects. Sound Forge Like Audition, Sound Forge is a sophisticated PC-based program for editing WAV files. Pro Tools: a high-end integrated audio production and editing environment . It offers MIDI creation and manipulation; powerful audio mixing, recording, and editing software.

3. Graphics and Image Editing
Adobe Illustrator: a powerful publishing tool from Adobe. Uses vector graphics; graphics can be exported to Web. Adobe Photoshop: the standard in a graphics, image processing and manipulation tool. Allows layers of images, graphics, and text that can be separately manipulated for maximum flexibility. Filter factory permits creation of sophisticated lighting-effects filters Macromedia Fireworks: software for making graphics specifically for the web. Macromedia Freehand: a text and web graphics editing tool that supports many bitmap formats such as GIF, PNG, and JPEG.

4. Video Editing Adobe Premiere: an intuitive, simple video editing tool for nonlinear editing, i.e., putting video clips into any order: - Video and audio are arranged in "tracks". - Provides a large number of video and audio tracks, super- impositions and virtual clips. A large library of built-in transitions, filters and motions for clips => effective multimedia productions with little effort. Adobe After Effects: a powerful video editing tool that enables users to add and change existing movies. Can add many effects: lighting, shadows, motion blurring; layers.

4. Video Editing Final Cut Pro: a video editing tool by Apple; Macintosh only. CyberLink PowerDirector: PowerDirector produced by CyberLink Corp. is by far the most popular nonlinear video editing software. It provides a rich selection of audio and video features and special effects easy to use. It supports all modern video formats (AVCHD 2.0, 4K Ultra HD, and 3D video) It supports 64-bit video processing it is not as “programmable” as Premiere.

5. Animation Multimedia APIs:
Java3D: API used by Java to construct and render 3D graphics, similar to the way in which the Java Media Framework is used for handling media files. 1. Provides a basic set of object primitives (cube, splines, etc.) for building scenes. 2. It is an abstraction layer built on top of OpenGL or DirectX (the user can select which). DirectX : Windows API that supports video, images, audio and 3-D animation OpenGL: the highly portable, most popular 3-D API.

5. Animation Animation Software (Rendering Tools):
3D Studio Max: rendering tool that includes a number of very high-end professional tools for character animation, game development, and visual effects production. Softimage XSI: a powerful modeling, animation, and rendering package used for animation and special effects in films and games. Maya: competing product to Softimage; as well, it is a complete modeling package. RenderMan: rendering package created by Pixar.

5. Animation GIF Animation Packages :
- simpler approach to animation, allows very quick development of effective small animations for the web. - GIFs can contain several images, and looping through them creates a simple animation. - Linux also provides some simple animation tools, such as animate.

6. Multimedia Authoring Tools that provide the capability for creating a complete multimedia presentation, including interactive user control, are called authoring programs. Macromedia Flash: allows users to create interactive movies by using the score metaphor, i.e., a timeline arranged in parallel event sequences. Macromedia Director: uses a movie metaphor to create interactive presentations. It is very powerful and includes a built in scripting language, Lingo, that allows creation of complex interactive movies.

6. Multimedia Authoring - Authorware: a mature, well-supported authoring product based on the Iconic/Flow-control metaphor. Quest: similar to Authorware in many ways, uses a type of flowcharting metaphor. However, the flowchart nodes can encapsulate information in a more abstract way (called frames) than simply subroutine levels.

End of Chapter 1 Introduction and Multimedia Data Representations

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Chapter 2 : A Taste of Multimedia

This chapter introduces:
a set of tasks and concerns that are considered in studying multimedia. issues in multimedia production and presentation are discussed

2.1 Multimedia Tasks and Concerns
Multimedia content is ubiquitous in software all around us, including in our phones. We are interested in making interactive applications (or “presentations”), using: video editors such as Adobe Premiere or Cyberlink PowerDirector still-image editors such as Adobe Photoshop in the first instance, but then combining the resulting resources into interactive programs by making use of “authoring” tools such as Flash and Director that can include sophisticated programming.

2.2 Multimedia Presentation
What effects to consider for multimedia presentation Guidelines for content design

Graphics Styles Careful thought has gone into combinations of color schemes and how lettering is perceived in a presentation. When constructing presentation then the Human visual dynamics should be considered. Human visual dynamics :As soon as the eye moves (saccades ترمش) it re-adjusts its exposure both chemically and geometrically by adjusting the iris القزحيةwhich regulates the size of the pupil بؤبؤ. Assignment: define in more detail what is Human Visual Dynamics.

Color Principles and Guidelines (See figure in next slide) Some color schemes and art styles are best combined with a certain theme or style. A general hint is to not use too many colors, as this can be distracting. It helps to be consistent with the use of color Then color can be used to signal changes in theme.

Fonts For effective visual communication,: large fonts (18 to 36 points) are best, with no more than six to eight lines per screen. (See Figure in previous slide.) Upper part is good, while bottom one is poor. (Why do you think?)

A Color Contrast The simplest approach to making readable colors on a screen is to use the principal complementary color as the background for text. For color values in the range 0–1(or, effectively, 0–255), if the text color is some triple (Red, Green, Blue), or (R,G, B) for short, a legible color for the background is likely given by that color subtracted from the maximum: (R, G, B) ⇒ (1 − R, 1 − G, 1 − B) or (R, G, B) ⇒ (255 − R, 255 − G, 255 − B)

A Color Contrast Another way to make reasonable color on a screen is the color “opposite” Also if the text is bright, the background is dark, and vice versa.

Sprite (شبح) Animation Sprites are often used in animation. This simple example of animation is described in page 28. Assignment: explain that example.

Video Transitions Video transitions are syntactic نحوي means to signal “scene changes” and often carry semantic دلالات لفظية meaning. Many different types of transitions exist; the main types are: cuts, wipes, dissolves, fade-ins, fade-outs.

Types of Transitions: A cut: carries out an abrupt مفاجيء change of image contents in two consecutive video frames from their respective clips. It is the: simplest and most frequently used video transition.

Types of Transitions: A wipe: is a replacement of the pixels in a region of the viewport with those from another video.

Types of Transitions: A dissolve احلال محل: replaces every pixel with a mixture over time of the two videos, gradually changing the first to the second.

Types of Transitions: Fade-in. Fade-out. Assignment: explain both.

2.3 Data Compression One of the most evident and important challenges of using multimedia is the necessity to compress data. we need excellent and fast data compression in order to avoid such high data rates that cause problems for storage and networks. (See Table 2.1 for Uncompressed Video sizes) The more image compression is done, the worse the quality (Q) of that image is. Next slide Figure 2.9a shows an original, uncompressed image taken by a digital camera that allows full-accuracy images to be captured, with no data compression at all. In Fig. 2.9b,c that while Q = 75 and 25 are not terrible, if we insist on going down to a Quality Factor of Q = 5 we do end up with an unusable image Fig. 2.9d.

2.3 Data Compression What is the best compression ratio for JPEG images and for MPEG video, while remaining reasonable quality? Assignment how expensive image and video processing is in terms of processing in the CPU?

2.4 Multimedia Production
multimedia production can easily involve a host of people with specialized skills: an art director, graphic designer, production artist, producer, project manager, writer, user interface designer, sound designer, videographer, and 3D and 2D animators, as well as programmers.

2.4 Multimedia Production
During the production timeline: The programmer is involved when the project is about 40% complete the design phase consists of: storyboarding, flowcharting, prototyping, and user testing, as well as a parallel production of media. Programming and debugging phases would be carried out in consultation with marketing, and the distribution phase would follow. Assignment: describe what can be done in each part of the design phase.

2.5 Multimedia Sharing and Distribution
Multimedia content, once produced, needs to be published and then shared among users: Optical disks USB Internet Consider YouTube, the most popular video sharing site over the Internet, as an example. A user can easily create a Google account and channel (as YouTube is now owned by Google), and then upload a video, which will be shared to everyone or to selected users. YouTube further enables titles and tags that are used to classify the videos and link similar videos together (shown as a list of related videos). The link to this video can be fed into such other social networking sites such as Facebook or Twitter as well, potentially propagating to many users of interest in a short time

The Internet is reshaping traditional TV broadcasting, as well. In the UK, the BBC’s iPlayer has been successfully broadcasting high-quality TV programs to both TV subscribers and public Internet users with Adobe Flashplayer since 2007. In the US, CNBC, Bloomberg Television, and Showtime use live-streaming services. China, the largest Internet Protocol TV (IPTV) market by subscribers (12.6 million) to date, is probably the most vigorous market.

Users’ viewing habits are also changing IPTV services are becoming highly personalized, integrated, portable, and on-demand. Most service providers are moving beyond basic video offerings toward richer user experiences, particularly with the support for multi-screen viewing across TVs, PCs, tablets, and smartphones.

2.6 Some Useful Editing and Authoring Tools
Since the first step in creating a multimedia application is probably creation of interesting video clips, we start off with looking at a video editing tool: Premiere: video editing program that allows you to quickly create a simple digital video by assembling and merging multimedia components Director: complete environment for creating interactive “movies” and animation. Traditional animation is created by showing slightly different images over time. Flash: is a simple authoring tool that facilitates the creation of interactive movies.

End of Chapter 2

Eye dynamic range: The retina (الشبكية) has a static contrast ratio (نسبة التباين) of around 100:1 (about 6.5 f-stops). The contrast ratio is a property of a display system, defined as the ratio of the luminance (الانارة) of the brightest color (white) to that of the darkest color (black) that the system is capable of producing (100:1) As soon as the eye moves (saccades ترمش) it re-adjusts its exposure both chemically and geometrically by adjusting the iris القزحيةwhich regulates the size of the pupil بؤبؤ. F-number, F-stop, focal stop, or The aperture فتحة stop is the aperture setting that limits the brightness of the image by restricting the input pupil size

f-stop: In optics, the f-number (sometimes called focal ratio, f-ratio, f-stop, or relative aperture[1]) of an optical system is the ratio of the lens's focal length to the diameter of the entrance pupil F-number, F-stop, focal stop, or The aperture فتحة stop is the aperture setting that limits the brightness of the image by restricting the input pupil size The eye includes a lens similar to lenses found in optical instruments such as cameras and the same principles can be applied. The pupil of the human eye is its aperture; the iris is the diaphragm that serves as the aperture stop. The f-number N is given by: N=f/D, where f is the focal length, and D is the diameter of the entrance pupil For example, if a lens's focal length is 10 mm and its entrance pupil diameter is 5 mm, the f-number is 2 and the aperture diameter is f/2.

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Chapter 3 : Graphics and Image Data Representations

This chapter introduces:
how best to represent the graphics and image data since it is of crucial importance in the study of multimedia. The specifics of file formats for storing such images are also discussed

3.1 Graphics/Image Data Types
Table 3.1 shows a list of file formats used in the popular product Adobe Premiere. We concentrate on GIF and JPG image file formats, since the GIF file format is one of the simplest and contains several fundamental features, and the JPG file format is arguably the most important overall.

Bit Images Images consist of pixels (picture elements in digital images). A 1-bit image (also called binary image) consists of on and off bits only and thus is the simplest type of image. Each pixel is stored as a single bit (0 or 1) It is also sometimes called a 1-bit monochrome (called Lena image by scientists) image since it contains no color. See Figure in next slide. Monochrome 1-bit images can be satisfactory for pictures containing only simple graphics and text. fax machines use 1-bit data, so in fact 1-bit images are still important.

Monochrome 1-bit Lena image
A 640×480 monochrome image requires 38.4 kB of storage

3.1.2 8-Bit Gray-Level Images
8-bit image is one for which each pixel has a gray value between 0 and 255. Each pixel is represented by a single byte. The entire image can be thought of as a two-dimensional array of pixel values referred to as a bitmap. Image resolution refers to the number of pixels in a digital image (higher resolution always yields better quality but increases size)

Grayscale image of Lena
640×480 grayscale image requires 300kB of storage

Bit Color Images In a color 24-bit image, each pixel is represented by three bytes, usually representing RGB. Since each value is in the range 0–255, this format supports 256×256×256, or a total of 16,777,216, possible combined colors; which increases storage size. a 640 × bit color image would require kB of storage. (without any compression applied) Compression is used to decrease the image size by simply grouping pixels effectively. (chapter 7).

24-bit color image forestfire.bmp
Microsoft Windows BMP format

3.1.5 Higher Bit-Depth Images
In some fields such as medicine (security cameras, satellite imaging) more accurate images are required to see the patient’s liver, for example. To get such images, special cameras that view more than just 3 colors (RGB) are used. Such images are called multispectral (more than three colors) or hyperspectral (224 colors for satellite imaging).

Bit Color Images reasonably accurate color images can be obtained by quantizing the color information to collapse it. Color quantizing example: reducing the number of colors required to represent a digital image makes it possible to reduce its file size. 8-bit color image (so-called 256 colors. Why?) ﬁles use the concept of a lookup table (LUT) to store color information. For example,: if exactly 23 pixels have RGB values (45, 200, 91) then store the value 23 in a three-dimensional array, at the element indexed by the index values [45, 200, 91]. This data structure is called a color histogram. color histogram: is a very useful tool for image transformation and manipulation in Image Processing.

Notice that the difference between Fig. 3
Notice that the difference between Fig. 3.5a, the 24-bit image, and Fig. 3.7, the 8-bit image, is reasonably small. Fig. 3.5a, the 24-bit image Fig. 3.7, the 8-bit image

Another example for difference between Fig. 3
Another example for difference between Fig. 3.5a, the 24-bit image, and Fig. 3.7, the 8-bit image, is reasonably small. Fig. 3.5a, the 24-bit image Fig. 3.7, the 8-bit image

Bit Color Images Note the great savings in space for 8-bit images over 24-bit ones: a 640 × bit color image requires only 300 kB of storage, compared to kB for a color image (again, without any compression applied).

3.1.7 Color Lookup Tables The LUT is often called a palette.
The idea is to store only the index, or code value, for each pixel. if a pixel stores, say, the value 25 (Figure 3.8), the meaning is to go to row 25 in a color lookup table (LUT).

A Color-picker consists of an array of fairly large blocks of color (or a semi-continuous range of colors) such that a mouse-click will select the color indicated. - In reality, a color-picker displays the palette colors associated with index values from 0 to 255. - Fig. 3.9 (next slide) displays the concept of a color-picker: if the user selects the color block with index value 2, then the color meant is cyan, with RGB values (0, 255, 255).

Fig. 3.9: Color-picker for 8-bit color: each block of the color-picker corresponds to one row of the color LUT

3.2 Popular File Formats 8-bit GIF : one of the most important formats because of its historical connection to the WWW and HTML markup language as the first image type recognized by net browsers. JPEG: currently the most important common file format.

GIF GIF standard (Graphics Interchange Format): (We examine GIF standard because it is so simple! yet contains many common elements.) Limited to 8-bit (256) color images only, which, while producing acceptable color images, is best suited for images with few distinctive colors (e.g., graphics or drawing). GIF standard supports interlacing — successive display of pixels in widely-spaced rows by a 4-pass display process. (Figure 3.16, slide 25) interlacing allows a quick sketch to appear when a web browser displays the image, followed by more detailed ﬁll-ins. The JPEG standard (below) has a similar display mode, denoted progressive mode. GIF has two formats GIF87 (standard) and GIF89 supports simple animation.

GIF87 For the standard specification, the general file format of a GIF87 file is as in Fig The Signature is six bytes the Screen Descriptor is a seven-byte Local Color Map (if does not exist A global color map can be defined) A GIF87 file can contain more than one image definition, usually to fit on several different parts of the screen. actual raster data itself is first compressed using the LZW compression scheme (see Chap. 7) Fig. 3.12: GIF file format.

Fig. 3.13: GIF screen descriptor.
Screen Descriptor comprises a set of attributes that belong to every image in the file. According to the GIF87 standard, it is defined as in Fig LSB/ MSB : Least/Most Significant Byte Bit 7 is filled with zeros Fig. 3.13: GIF screen descriptor.

Color Map is set up in a very simple fashion as in Fig. 3. 14
Color Map is set up in a very simple fashion as in Fig However, the actual length of the table equals 2(pixel+1) as given in the Screen Descriptor. Fig. 3.14: GIF color map.

Fig. 3.15: GIF image descriptor.
Each image in the file has its own Image Descriptor, defined as in Fig Fig. 3.15: GIF image descriptor.

interlace If the interlace bit is set to (1), then the local Image Descriptor, the rows of the image are displayed in a four-pass sequence, as in Fig (next slide) Here, the ﬁrst pass displays rows 0 and 8, the second pass displays rows 4 and 12, and so on.

Fig. 3.16: GIF 4-pass interlace display row order.

JPEG JPEG (Joint Photographic Experts Group): The most important current standard for image compression (.jpg, .jpeg, .jpe). The human vision system has some specific limitations (The eye–brain system cannot see extremely ﬁne detail; those are dropped ) and JPEG takes advantage of these to achieve high rates of compression. JPEG allows the user to set a desired level of quality, or compression ratio (input divided by output). As an example, Fig shows our forestfire image, with a quality factor Q=10%. - This image is a mere 1.5% of the original size. In comparison, a JPEG image with Q=75% yields an image size 5.6% of the original, whereas a GIF version of this image compresses down to 23.0% of uncompressed image size.

A photo of a flower compressed with successively more lossy compression ratios from left to right.
83

Fig. 3.17: JPEG image with low quality specified by user.

PNG PNG format: standing for Portable Network Graphics — meant to supersede the GIF standard, and extends it in important ways. Special features of PNG files include: 1. Support for up to 48 bits of color information — a large increase. 2. Files may contain gamma-correction information for correct display of color images, as well as alpha-channel information for such uses as control of transparency. 3. The display progressively displays pixels in a 2-dimensional fashion by showing a few pixels at a time over seven passes through each 8 X 8 block of an image.

TIFF TIFF: stands for Tagged Image File Format.
The support for attachment of additional information (referred to as “tags”) provides a great deal of flexibility. 1. The most important tag is a format signifier: what type of compression etc. is in use in the stored image. 2. TIFF can store many different types of image: 1-bit, grayscale, 8-bit color, 24-bit RGB, etc. 3. TIFF was originally a lossless format but now a new JPEG tag allows one to opt for JPEG compression. 4. The TIFF format was developed by the Aldus Corporation in the 1980's and was later supported by Microsoft.

PS and PDF PostScript is an important language for typesetting, and many high-end printers have a PostScript interpreter built into them. PostScript is a vector-based, rather than pixel based, picture language: page elements are essentially defined in terms of vectors. PostScript includes vector/structured graphics as well as text Several popular graphics programs, such as Adobe Illustrator, use PostScript. Note, however, that the PostScript page description language does not provide compression; in fact, PostScript files are just stored as ASCII.

PS and PDF Therefore, another text + figures language has largely superseded PostScript is Portable Document Format (PDF) file format. PDF files that do not include images have about the same compression ratio, while For files containing images, PDF may achieve higher compression ratios by using separate JPEG compression for the image content

End of Chapter 3

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Chapter 4 : Color in Image and Video

This chapter explores:
several issues in the use of color, since color is vitally important in multimedia programs in this chapter we shall consider the following topics: Color Science Color Models in Images Color Models in Video.

Color in Image and Video
4.1 Color Science 4.2 Color Models in Images 4.3 Color Models in Video 4.4 Further Exploration

4.1 Color Science Light and Spectra
Light is an electromagnetic wave. Its color is characterized by the wavelength content of the light. (a) Laser light consists of a single wavelength: e.g., a ruby (ياقوت) laser produces a bright, scarlet(أحمر قرمزي)-red beam. (b) Most light sources produce contributions over many wavelengths. (c) However, humans cannot detect all light, just contributions that fall in the "visible wavelengths". (d) Short wavelengths produce a blue sensation, long wavelengths produce a red one.

4.1 Color Science Spectrophotometer: device used to measure visible light, by reflecting light from a diffraction grating (prism حاجز انكسار كالمنشور) (a ruled surface) that spreads out the different wavelengths. Figure 4.1 shows the phenomenon that white light contains all the colors of a rainbow. Visible light is an electromagnetic wave in the range 400 nm to 700 nm (where nm stands for nanometer, 10−9 meters).

4.1 Color Science Fig. 4.1: Sir Isaac Newton's experiments.

4.1 Color Science Fig. 4.2 (See Book) shows the relative power in each wavelength interval for typical outdoor light on a sunny day. This type of curve is called a Spectral Power Distribution (SPD) or a spectrum. The symbol for wavelength is λ. This curve is called E(λ ).

4.1 Color Science Human Vision
The eye works like a camera, with the lens focusing an image onto the retina شبكية (upside-down and left-right reversed). The retina consists of an array of rods and three kinds of cones. See images (rods_cones, rods_cones1). The rods come into play when light levels are low and produce a image in shades of gray ("all cats are gray at night!"). For higher light levels, the cones each produce a signal. Because of their differing pigments, the three kinds of cones are most sensitive to red (R), green (G), and blue (B) light. It seems likely that the brain makes use of differences R-G, G-B, and B-R, as well as combining all of R, G, and B into a high-light-level achromatic channel.

4.1 Color Science Spectral طيفيSensitivity of the Eye
The eye is most sensitive to light in the middle of the visible Spectrum طيف. The sensitivity of our receptors is also a function of wave-length (Fig See Book). The Blue receptor sensitivity is not shown to scale because it is much smaller than the curves for Red or Green – Blue is a late addition, in evolution. Fig. 4.3 shows the overall sensitivity as a dashed line – this important curve is called the luminousاضاءة -efficiency function. It is usually denoted V (λ) and is formed as the sum of the response curves for Red, Green, and Blue.

4.1 Color Science Spectral طيفيSensitivity of the Eye The eye has about 6 million cones, but the proportions of R, G, and B cones are different. They likely are present in the ratios 40:20:1 So the achromatic اللوني channel produced by the cones is thus something like 2R + G + B/20.

4.1 Color Science Image Formation In most situations, we actually image light that is reflected from a surface. Surfaces reflect different amounts of light at different wavelengths, and dark surfaces reflect less energy than light surfaces. then the reflected light filtered by the eye’s cone See Figure 4.5 Next slide

4.1 Color Science Fig. 4.5: Image formation model.

4.1 Color Science Camera Systems
Camera systems are made in a similar fashion; a good camera has three signals produced at each pixel location (corresponding to a retinal position). Analog signals are converted to digital, truncated to integers, and stored. If the precision used is 8-bit, then the maximum value for any of R; G;B is 255, and the minimum is 0.

End of Chapter 4

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Chapter 5 : Fundamental Concepts in Video

the principal notions needed to understand video
This chapter explores: the principal notions needed to understand video in this chapter we shall consider the following aspects of video and how they impact multimedia applications: Analog video Digital video Video display interfaces 3D video

Video Since video is created from a variety of sources, we begin with the signals themselves Analog video is represented as a continuous (time- varying) signal Digital video is represented as a sequence of digital images.

5.1 Analog Video An analog signal f (t) samples a time-varying image. So- called progressive scanning traces through a complete picture (a frame) row-wise for each time interval. A high-resolution computer monitor typically uses a time interval of 1/72 s. In TV and in some monitors and multimedia standards, another system, interlaced scanning, is used. Here, the odd-numbered lines are traced first, then the even-numbered lines. This results in “odd” and “even” fields—two fields make up one frame.

5.1 interlacing

5.1 interlacing In fact, the odd lines (starting from 1) end up at the middle of a line at the end of the odd field, and the even scan starts at a half-way point. Figure 5.1 (previous slide) shows the scheme used. First the solid (odd) lines are traced—P to Q, then R to S, and so on, ending at T Then the even field starts at U and ends at V. The scan lines are not horizontal because a small voltage is applied, moving the electron beam down over time.

5.1 interlacing Interlacing was invented because,
when standards were being defined, it was difficult to transmit the amount of information in a full frame quickly enough to avoid flicker, the double number of fields presented to the eye reduces the eye perceived flicker. The jump from Q to R and so on in Fig. 5.1 is called the horizontal retrace, during which the electronic beam in the CRT is blanked. The jump from T to U or V to P is called the vertical retrace.

5.1.1 NTSC Video NTSC stands for (National Television System Committee of the U.S.A) The NTSC TV standard is mostly used in North America and Japan. It uses a familiar 4:3 aspect ratio (i.e., the ratio of picture width to height) and 525 (interlaced) scan lines per frame at 30 fps. Figure 5.4 shows the effect of “vertical retrace and sync” and “horizontal retrace and sync” on the NTSC video raster.

What is Raster Graphics?
a raster graphics image is a dot matrix data structure representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium. A raster is technically characterized by the width and height of the image in pixels and by the number of bits per pixel (a color depth, which determines the number of colors it can represent)

What is Raster Graphics?
The smiley face in the top left corner is a raster image. When enlarged, individual pixels appear as squares. Zooming in further, they can be analyzed, with their colors constructed by adding the values for red, green and blue.

5.1.1 NTSC Video Figure 5.4 shows the effect of “vertical retrace and sync” and “horizontal retrace and sync” on the NTSC video raster. Blanking information is placed into 20 lines reserved for control information at the beginning of each field. Hence, the number of active video lines per frame is only 485. Similarly, almost 1/6 of the raster at the left side is blanked for horizontal retrace and sync. The nonblanking pixels are called active pixels. Image data is not encoded in the blanking regions, but other information can be placed there, such as V-chip information, stereo audio channel data, and subtitles in many languages.

5.1.1 NTSC Video NTSC video is an analog signal with no fixed horizontal resolution. Therefore, we must decide how many times to sample the signal for display. Each sample corresponds to one pixel output. A pixel clock divides each horizontal line of video into samples. The higher the frequency of the pixel clock, the more samples per line. Different video formats provide different numbers of samples per line, as listed in Table 5.1.

5.1.1 NTSC Video Table 5.1: Samples per line for various analog video formats Format Samples per line VHS S-VHS Betamax 500 Standard 8m 300 Hi-8 mm 425

Sampling a sample is an intersection of channel and a pixel
The diagram below depicts a 24-bit pixel, consisting of 3 samples for Red (channel) , Green (channel) , and Blue (channel) . In this particular diagram, the Red sample occupies 9 bits, the Green sample occupies 7 bits and the Blue sample occupies 8 bits, totaling 24 bits per pixel A sample is related to a subpixel on a physical display.

Vertical Trace Alternatively referred to as a vertical blanking interval or the vertical sync signal, vertical retrace is used to describe the action performed within the computer monitor that turns the monitor beam off when moving it from the lower-right corner of a monitor to the upper-left of the monitor. This action takes place each time the beam has completed tracing the entire screen to create an image.

5.1.2 PAL Video PAL (Phase Alternating Line) is a TV standard originally invented by German scientists. This important standard is widely used in Western Europe, China, India, and many other parts of the world. Because it has higher resolution than NTSC, the visual quality of its pictures is generally better.

Table 5.2: Comparison of Analog Broadcast TV Systems
TV Frame #of Total Bandwidth System Rate scan Channel Allocation fps lines width MHz MHz Y I or U Q or V NTSC PAL SECAM

5.1.3 SECAM Video SECAM, which was invented by the French, is the third major broadcast TV standard. SECAM stands for Système Electronique Couleur Avec Mémoire. SECAM and PAL are similar, differing slightly in their color coding scheme.

5.2 Digital Video The advantages of digital representation for video:
Storing video on digital devices or in memory, ready to be processed (noise removal, cut and paste, and so on) and integrated into various multimedia applications. Direct access, which makes nonlinear video editing simple. Repeated recording without degradation of image quality. Ease of encryption and better tolerance to channel noise.

5.2.2 CCIR and ITU-R Standards for Digital Video
The CCIR is the Consultative Committee for International Radio. One of the most important standards it has produced is CCIR- 601 for component digital video. This standard has since become standard ITU-R Rec. 601, an international standard for professional video applications. It is adopted by several digital video formats, including the popular DV video.

CIF stands for Common Intermediate Format, specified by the International Telegraph and Telephone Consultative Committee (CCITT) now superseded by the International Telecommunication Union, which oversees both telecommunications (ITU-T) and radio frequency matters (ITU-R) under one United Nations body The idea of CIF, which is about the same as VHS quality, is to specify a format for lower bitrate. CIF uses a progressive (noninterlaced) scan. QCIF stands for Quarter-CIF, and is for even lower bitrate.

CIF is a compromise حل وسط between NTSC and PAL, in that it adopts the NTSC frame rate and half the number of active lines in PAL. When played on existing TV sets, NTSC TV will first need to convert the number of lines, whereas PAL TV will require frame rate conversion.

5.2.3 High-Definition TV The introduction of wide-screen movies brought the discovery that viewers seated near the screen enjoyed a level of participation (sensation of immersion انغمار) not experienced with conventional movies. Apparently the exposure to a greater field of view, especially the involvement of peripheral محيطيvision, contributes to the sense of “being there.” The main thrust of High-Definition TV (HDTV) is not to increase the “definition” in each unit area, but rather to increase the visual field, especially its width. First-generation HDTV was based on an analog technology developed by Sony and NHK in Japan in the late 1970s.

5.2.3 High-Definition TV MUltiple sub-Nyquist Sampling Encoding (MUSE) was an improved NHK HDTV with hybrid analog/digital technologies that was put in use in the 1990s. It has 1,125 scan lines, interlaced (60 fields per second), and a 16:9 aspect ratio. (compare with NTSC 4:3 aspect ratio, see slide 8) In 1987, the FCC decided that HDTV standards must be compatible with the existing NTSC standard and must be confined to the existing Very High Frequency (VHF) and Ultra High Frequency (UHF) bands.

5.2.4 Ultra High Definition TV (UHDTV)
UHDTV is a new development—a new generation of HDTV! The standards announced in 2012 The aspect ratio is 16:9. The supported frame rate has been gradually increased to 120 fps.

5.3 Video Display Interfaces
We now discuss the interfaces for video signal transmission from some output devices (e.g., set-top box, video player, video card, and etc.) to a video display (e.g., TV, monitor, projector, etc.). There have been a wide range of video display interfaces, supporting video signals of different formats (analog or digital, interlaced or progressive), different frame rates, and different resolutions We start our discussion with analog interfaces, including Component Video, Composite Video, and S- Video, and then digital interfaces, including DVI, HDMI, and DisplayPort.

5.3.1 Analog Display Interfaces
Analog video signals are often transmitted in one of three different interfaces: Component video, Composite video, and S-video. Figure 5.7 shows the typical connectors for them Fig. 5.7 Connectors for typical analog display interfaces. From left to right: Component video, Composite video, S-video, and VGA

Component Video Higher end video systems, such as for studios, make use of three separate video signals for the red, green, and blue image planes. This is referred to as component video. This kind of system has three wires (and connectors) connecting the camera or other devices to a TV or monitor.

Composite Video When connecting to TVs or VCRs, composite video uses only one wire (and hence one connector, such as a BNC connector at each end of a coaxial cable or an RCA plug at each end of an ordinary wire), and video color signals are mixed, not sent separately. The audio signal is another addition to this one signal.

S-Video As a compromise, S-video (separated video, or super-video, e.g., in S-VHS) uses two wires: one for luminance and another for a composite chrominance signal. The reason for placing luminance into its own part of the signal is that black-and white information is most important for visual perception. As noted in the previous chapter, humans are able to differentiate spatial resolution in the grayscale (“black and-white”) part much better than for the color part of RGB images. Therefore, color information transmitted can be much less accurate than intensity information. We can see only fairly large blobs (نقاط) of color, so it makes sense to send less color detail.

Video Graphics Array (VGA) The Video Graphics Array (VGA) is a video display interface that was first introduced by IBM in 1987, along with its PS/2 personal computers. It has since been widely used in the computer industry with many variations, which are collectively referred to as VGA. The initial VGA resolution was 640×480 pixels. The VGA video signals are based on analog component RGBHV (red, green, blue, horizontal sync, vertical sync).

5.3.2 Digital Display Interfaces
Given the rise of digital video processing and the monitors that directly accept digital video signals, there is a great demand toward video display interfaces that transmit digital video signals. Such interfaces emerged in 1980s (e.g., Color Graphics Adapter (CGA) Today, the most widely used digital video interfaces include Digital Visual Interface (DVI), High-Definition Multimedia Interface (HDMI), and Display Port, as shown in Fig. 5.8. Fig. 5.8 Connectors of different digital display interfaces. From left to right: DVI, HDMI, DisplayPort

Digital Visual Interface (DVI) Digital Visual Interface (DVI) was developed by the Digital Display Working Group (DDWG) for transferring digital video signals, particularly from a computer’s video card to a monitor. It carries uncompressed digital video and can be configured to support multiple modes, including DVI-D (digital only), DVI-A (analog only), or DVI-I (digital and analog). The support for analog connections makes DVI backward compatible with VGA (though an adapter is needed between the two interfaces). The DVI allows a maximum 16:9 screen resolution of 1920×1080 pixels.

High-Definition Multimedia Interface (HDMI) HDMI is a newer digital audio/video interface developed to be backward-compatible with DVI. HDMI, however, differs from DVI in the following aspects: 1. HDMI does not carry analog signal and hence is not compatible with VGA. 2. DVI is limited to the RGB color range (0–255). 3. HDMI supports digital audio, in addition to digital video. The HDMI allows a maximum screen resolution of 2560×1600 pixels. 2, 560×1, 600

Display Port Display Port is a digital display interface. It is the first display interface that uses packetized data transmission, like the Internet or Ethernet Display Port can achieve a higher resolution with fewer pins than the previous technologies. The use of data packets also allows Display Port to be extensible, i.e., new features can be added over time without significant changes to the physical interface itself. Display Port can be used to transmit audio and video simultaneously, or either of them. Compared with HDMI, Display Port has slightly more bandwidth, which also accommodates multiple streams of audio and video to separate devices.

5.4 3D Video and TV the rapid progress in the research and development of 3D technology and the success of the 2009 film Avatar have pushed 3D video to its peak. The main advantage of the 3D video is that it enables the experience of immersion be there, and really Be there! Increasingly, it is in movie theaters, broadcast TV (e.g., sporting events), personal computers, and various handheld devices.

5.4.1 Cues (اشارات) for 3D Percept
The human vision system is capable of achieving a 3D percept by utilizing multiple cues. They are combined to produce optimal (or nearly optimal) depth estimates. When the multiple cues agree, this enhances the 3D percept. When they conflict with each other, the 3D percept can be hindered. Sometimes, illusions can arise.

Monocular Cues احادي الرؤيا (عين واحدة)
The monocular cues that do not necessarily involve both eyes include: Shading—depth perception by shading and highlights Perspective منظور scaling—converging parallel lines with distance and at infinity Relative size—distant objects appear smaller compared to known same-size objects not in distance Texture gradient نسيج—the appearance of textures change when they recede يتراجع in distance Blur gradient—ضبابيobjects appear sharper at the distance where the eyes are focused, whereas nearer and farther objects are gradually blurred Haze—due تشوشto light scattering by the atmosphere, objects at distance have lower contrast and lower color saturation Occlusion إطباق—a far object occluded by nearer object(s) Motion parallax اختلاف المنظر—induced by object movement and head movement, such that nearer objects appear to move faster. Among the above monocular cues, it has been said that Occlusion and Motion parallax are more effective.

Binocular Cues The human vision system utilizes effective binocular vision, i.e., stereo vision or stereopsis (Greek word "stereos" which means firm or solid). Our left and right eyes are separated by a small distance, on average approximately 2.5 inches, or 65mm, which is known as the interocular distance المسافة بين العينين. As a result, the left and right eyes have slightly different views, i.e., images of objects are shifted horizontally. The amount of the shift, or disparity, is dependent on the object’s distance from the eyes, i.e., its depth, thus providing the binocular cue for the 3D percept. The horizontal shift is also known as horizontal parallax. The fusion of the left and right images into single vision occurs in the brain, producing the 3D percept. Current 3D video and TV systems are almost all based on stereopsis because it is believed to be the most effective cue.

5.4.2 3D CameraModels Simple Stereo Camera Model
We can design a simple (artificial) stereo camera system in which the left and right cameras are identical (same lens, same focal length, etc.); the cameras’ optical axes are in parallel, pointing at the Z-direction, the scene depth Toed-in Stereo Camera Model Human eyes can be emulated by so-called Toed-in Stereo Cameras, in which the camera axes are usually converging تقاربيand not in parallel. One of the complications of this model is that objects at the same depth (i.e., the same Z) in the scene no longer yield the same disparity (تفاوت) In other words, the “disparity planes” are now curved. Objects on both sides of the view appear farther away than the objects in the middle, even when they have the same depth Z.

5.4.3 3DMovie and TV Based on Stereo Vision
3D Movie Using Colored Glasses 3D Movies Using Circularly Polarized Glasses 3D TV with Shutter Glasses

End of Chapter 5

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part II: Multimedia Data Compression Chapter 7 : Lossless Compression Algorithms

In this Part we examine the role played in multimedia by data compression, perhaps the most important enabling technology that makes modern multimedia systems possible. So much data exist, in archives, via streaming, and elsewhere, that it has become critical to compress this information. We start off in Chap. 7 looking at lossless data compression i.e., involving no distortion of the original signal once it is decompressed or reconstituted.

Fig. 7.1: A General Data Compression Scheme.
7.1 Introduction Compression: the process of coding that will effectively reduce the total number of bits needed to represent certain information. Figure 7.1 depicts a general data compression scheme, in which compression is performed by an encoder and decompression is performed by a decoder. Fig. 7.1: A General Data Compression Scheme.

7.1 Introduction If the compression and decompression processes induce no information loss, then the compression scheme is lossless; otherwise, it is lossy. Compression ratio: (7.1) B0 – number of bits before compression B1 – number of bits after compression In general, we would desire any codec (encoder/decoder scheme) to have a compression ratio much larger than 1.0. The higher the compression ratio, the better the lossless compression scheme, as long as it is computationally feasible.

7.2 Basics of Information Theory
The entropy η of an information source with alphabet S = {s1, s2, , sn} is: (7.2) (7.3) pi – probability that symbol si will occur in S. – indicates the amount of information ( self-information as defined by Shannon) contained in si, which corresponds to the number of bits needed to encode si. Li & Drew

7.2 Basics of Information Theory
What is entropy? is a measure of the number of specific ways in which a system may be arranged, commonly understood as a measure of the disorder of a system. As an example, if the information source S is a gray-level digital image, each si is a gray-level intensity ranging from 0 to (2k − 1), where k is the number of bits used to represent each pixel in an uncompressed image. We need to find the entropy of this image; which the number of bits to represent the image after compression.

Distribution of Gray-Level Intensities
Fig. 7.2 Histograms for Two Gray-level Images. • Fig. 7.2(a) shows the histogram of an image with uniform distribution of gray-level intensities, i.e., ∀i pi = 1/256. Hence, the entropy of this image is: log2256 = (7.4) • Fig. 7.2(b) shows the histogram of an image with two possible values (binary image). Its entropy is 0.92. Li & Drew

Distribution of Gray-Level Intensities
It is interesting to observe that in the above uniform- distribution example (fig. 7-2 (a)) we found that α = 8, the minimum average number of bits to represent each gray-level intensity is at least 8. No compression is possible for this image. In the context of imaging, this will correspond to the “worst case,” where neighboring pixel values have no similarity.

7.3 Run-Length Coding RLC is one of the simplest forms of data compression. The basic idea is that if the information source has the property that symbols tend to form continuous groups, then such symbol and the length of the group can be coded. Consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white: WWWWWBWWWWBBBWWWWWWBWWW If we apply the run-length encoding (RLE) data compression algorithm to the above hypothetical scan line, we get the following: 5W1B4W3B6W1B3W The run-length code represents the original 21 characters in only 14.

7.4 Variable-Length Coding
variable-length coding (VLC) is one of the best-known entropy coding methods Here, we will study the Shannon–Fano algorithm, Huffman coding, and adaptive Huffman coding.

7.4.1 Shannon–Fano Algorithm
To illustrate the algorithm, let us suppose the symbols to be coded are the characters in the word HELLO. The frequency count of the symbols is Symbol H E L O Count The encoding steps of the Shannon–Fano algorithm can be presented in the following top-down manner: 1. Sort the symbols according to the frequency count of their occurrences. 2. Recursively divide the symbols into two parts, each with approximately the same number of counts, until all parts contain only one symbol.

A natural way of implementing the above procedure is to build a binary tree. As a convention, let us assign bit 0 to its left branches and 1 to the right branches. Initially, the symbols are sorted as LHEO. As Fig. 7.3 shows, the first division yields two parts: L with a count of 2, denoted as L:(2); and H, E and O with a total count of 3, denoted as H, E, O:(3). The second division yields H:(1) and E, O:(2). The last division is E:(1) and O:(1).

Fig. 7.3: Coding Tree for HELLO by Shannon-Fano.

Table 7.1: Result of Performing Shannon-Fano on HELLO
Symbol Count Log2 Code # of bits used L 2 1.32 H 1 2.32 10 E 110 3 O 111 TOTAL # of bits: Li & Drew

Fig. 7.4 Another coding tree for HELLO by Shannon- Fano.
Li & Drew

Table 7.2: Another Result of Performing Shannon-Fano
on HELLO (see Fig. 7.4) Symbol Count Log2 Code # of bits used L 2 1.32 00 4 H 1 2.32 01 E 10 O 11 TOTAL # of bits: Li & Drew

The Shannon–Fano algorithm delivers satisfactory coding results for data compression, but it was soon outperformed and overtaken by the Huffman coding method. The Huffman algorithm requires prior statistical knowledge about the information source, and such information is often not available. This is particularly true in multimedia applications, where future data is unknown before its arrival, as for example in live (or streaming) audio and video. Even when the statistics are available, the transmission of the symbol table could represent heavy overhead The solution is to use adaptive Huffman coding compression algorithms, in which statistics are gathered and updated dynamically as the data stream arrives.

7.5 Dictionary-Based Coding
The Lempel-Ziv-Welch (LZW) algorithm employs an adaptive, dictionary-based compression technique. Unlike variable-length coding, in which the lengths of the codewords are different, LZW uses fixed-length codewords to represent variable length strings of symbols/characters that commonly occur together, such as words in English text. As in the other adaptive compression techniques, the LZW encoder and decoder builds up the same dictionary dynamically while receiving the data—the encoder and the decoder both develop the same dictionary.

7.5 Dictionary-Based Coding
LZW proceeds by placing longer and longer repeated entries into a dictionary, then emitting (sending) the code for an element rather than the string itself, if the element has already been placed in the dictionary. Remember, the LZW is an adaptive algorithm, in which the encoder and decoder independently build their own string tables. Hence, there is no overhead involving transmitting the string table. LZW is used in many applications, such as UNIX compress, GIF for images, WinZip, and others.

End of Chapter 7

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part II: Multimedia Data Compression Chapter 8 : Lossy Compression Algorithms

8.1 Introduction As discussed in Chap. 7, the compression ratio for image data using lossless compression techniques (e.g., Huffman Coding, Arithmetic Coding, LZW) is low when the image histogram is relatively flat. For image compression in multimedia applications, where a higher compression ratio is required, lossy methods are usually adopted. In lossy compression, the compressed image is usually not the same as the original image but is meant to form a close approximation to the original image perceptually ادراكي.

8.2 DistortionMeasures To quantitatively describe how close the approximation is to the original data, some form of distortion measure is required. A distortion measure is a mathematical quantity that specifies how close an approximation is to its original, using some distortion criteria. When looking at compressed data, it is natural to think of the distortion in terms of the numerical difference between the original data and the reconstructed data.

End of Chapter 8

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part II: Multimedia Data Compression Chapter 9 : Image Compression Standards

Recent years have seen an explosion in the availability of digital images, because of the increase in numbers of digital imaging devices such as smart phones, webcams, digital cameras, and scanners. The need to efficiently process and store images in digital form has motivated the development of many image compression standards for various applications and needs. In general, standards have greater longevity than particular programs or devices and therefore warrant careful study.

9.1 image compression standard
In this chapter , some current standards are examined. JPEG JPEG2000 standard JPEG-LS Standard JBIG Standard JBIG2 Standard

End of Chapter 9

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part II: Multimedia Data Compression Chapter 10 : Basic Video Compression Techniques

As discussed in Chap. 7, the volume of uncompressed video data could be extremely large.
Even a modest CIF video with a picture resolution of only 352 × 288, if uncompressed, would carry more than 35 Mbps. In HDTV, the bitrate could easily exceed 1 Gbps. This poses challenges and problems for storage and network communications.

This chapter introduces some basic video compression techniques and illustrates them in standards H.261 and H.263—two video compression standards aimed mostly at videoconferencing. The next two chapters further introduce several MPEG video compression standards and the latest, H.264 and H.265.

10.1 Introduction to Video Compression
A video consists of a time-ordered sequence of frames— images. An obvious solution to video compression would be predictive coding based on previous frames. For example, suppose we simply created a predictor such that the prediction equals the previous frame. However, it turns out that at acceptable cost, we can do even better by searching for just the right parts of the image to subtract from the previous frame. After all, our naive subtraction scheme will likely work well for a background of office furniture and sedentary كثير الجلوس university types

End of Chapter 10

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part II: Multimedia Data Compression Chapter 11 : MPEG Video Coding: MPEG-1,2,4,and7

The Moving Picture Experts Group (MPEG) was established in to create a standard for delivery of digital video and audio. With the emerging new video compression standards such as H.264 and H.265 (to be discussed in Chap. 12), one might view these MPEG standards as old, i.e., outdated. This is simply not a concern because: The fundamental technology of hybrid coding and most important concepts Although the visual-object-based video representation and compression approach developed in MPEG-4 and 7 has not been commonly used in current popular standards, it has a great potential to be adopted in the future when the necessary Computer Vision technology for automatic object detection becomes more readily available.

This chapter introduces some basic video compression techniques and illustrates them in standards H.261 and H.263—two video compression standards aimed mostly at videoconferencing. The next two chapters further introduce several MPEG video compression standards and the latest, H.264 and H.265.

End of Chapter 11

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part III: Multimedia Communications and Networking Chapter 15 : Network Services and Protocols for Multimedia Communications

Multimedia places great demands on networks and systems.
Multimedia communication and content sharing over the Internet has quickly risen (telephone networks and television networks on the global Internet). Numerous new-generation multimedia-based applications have been developed over the Internet, e.g., Skype and YouTube. Multimedia applications generally start playback before downloads have completed, i.e., in a streaming mode. In the early time period, research attention was mostly focused on new streaming protocols, such as the Real-time Transport Protocol (RTP) and its Control Protocol (RTCP).

Recently, Web-based HTTP video streaming allows users to play videos directly from their Web browsers, rather than having to download and install dedicated software. The dream of ‘‘anywhere and anytime’’ multimedia communication and content sharing has now become reality (wireless mobile networking and smart portable devices). Indeed, the evolution of the Internet, particularly in the past two decades, has been largely driven by the ever-growing demands from numerous conventional and new generation multimedia applications.

15.1 Protocol Layers of Computer Communication Networks
The Open Systems Interconnection (OSI) Reference Model has (Physical Layer, Data Link Layer, Network layer, Transport layer, Session layer, Presentation layer, Application layer) Multimedia systems are generally implemented in the last three layers. The OSI model however has never been fully implemented; instead, the competing and more practical TCP/IP protocol suite has become dominating, which is also the core protocols for the transport and network layers of today’s Internet.

15.5 Quality-of-Service for Multimedia Communications
challenges in multimedia network communications arise due to a series of distinct characteristics of audio/video data: Voluminous and Continuous: They demand high data rates, and often have a lower bound to ensure continuous playback. In general, a user expects to start playing back audio/video objects before they are fully downloaded. For this reason, they are commonly referred to as continuous media or streaming media.

15.5 Quality-of-Service for Multimedia Communications
Real-Time and Interactive: They demand low startup delay and synchronization between audio and video for “lip sync”. Interactive applications such as video conferencing and multi- party online gaming require two-way traffic, both of the same high demands. Rate fluctuation: The multimedia data rates fluctuate drastically and sometimes bursty. In VoD or VoIP , no traffic most of the time but burst to high volume. In a variable bit rate (VBR) video, the average rate and the peak rate can differ significantly, depending on the scene complexity. Li & Drew

Quality of Service The most important parameters that affects QoS for multimedia data transmission: Bandwidth: A measure of transmission speed over digital links or networks, often in kilobits per second (kbps) or megabits per second (Mbps) Latency (maximum frame/packet delay): The maximum time needed from transmission to reception, often measured in milliseconds (msec, or ms). Packet loss or error: A measure (in percentage) of the loss- or error rate of the packetized data transmission. Sync skew: A measure of multimedia data synchronization (between audio and video)

15.6 Protocols for Multimedia Transmission and Interaction
Hyper Text Transfer Protocol: HTTP is a protocol that was originally designed for transmitting Web content, but it also supports transmission of any file type. Real-Time Transport Protocol (RTP), is designed for the transport of real-time data, such as audio and video streams. RTP Control Protocol (RTCP), is a companion protocol of RTP Real-Time Streaming Protocol (RTSP), is a signaling protocol to control streaming media servers. Li & Drew

End of Chapter 15

2nd Edition 2014 Ze-Nian Li Mark S. Drew Jiangchuan Liu Part IV: Multimedia Information Sharing and Retrieval Chapter 18 : Social Media Sharing

Over the past decade, a number of new technologies have contributed to the development of Web 2.0, which was formally introduced in late 2004. The advanced interaction provided by Web 2.0 allows every single user to generate and share content, representing a substantial change from the conventional Web 1.0, where users merely consume information. Nowadays, such popular Web 2.0-based social media sharing websites as YouTube, Facebook, and Twitter have drastically changed the content distribution landscape, and indeed have become an integral part in people’s daily life.

Social media, a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, allow the creation and exchange of user generated contents Two distinct characteristics of Web 2.0 are considered as key factors to the success of the new generation of social media: Collective intelligence: The base knowledge of general users of the Web is contributing to the content of the Web. The users have become the content providers and therefore the contents are richer and more dynamic Rich connections and activities: The users and their contents are linked to each other, creating an organic growth of connections and activities. A social network built out of the connections enables strong ties amongst the users and broad and rapid content propagation

There are many types of social media services, including user- generated content sharing (e.g., YouTube), online social networking (e.g., Facebook), question-and-answer (e.g., Ask), and collaboratively edited encyclopedia (e.g., Wikipedia). In these social media services, the contents, as well as the users, have become interconnected, enabling convenient information sharing for feelings, activities, and location information, as well as resources, blogs, photos, and videos.

With the pervasive penetration of wireless mobile networks, the advanced development of smart phones and tablets, and the massive market of mobile applications, social media contents can now be easily generated and accessed at any time and anywhere. YouTube’s own statistics have reported that YouTube mobile gets over 600 million views a day, making up almost 40% of YouTube’s global watch time.

18.1 Representative Social Media Services
There are two important social media services: user-generated media content sharing online social networking

18.1.1 User-Generated Content Sharing
User generated content (UGC) plays a key role in today’s social media services. It is used for a wide range of applications with different types of media, e.g., text, music, picture, and video, as well as a combination of open source, free software, and flexible licensing or related agreements to further reduce the barriers to collaboration, skill building, and discovery. In traditional video on-demand and live streaming services, videos are offered by enterprise content providers, stored in servers, and then streamed to users. A number of new generation video sharing websites, represented here by YouTube, offer users opportunities to make their videos directly accessible by others, by such simple operations in Web 2.0 as embedding and sharing.

18.1.1 User-Generated Content Sharing
YouTube (Established in 2005) is so far the most significant and successful video sharing website. YouTube had served: 100 million videos per day in 2006; by December 2013, more than 1 billion unique users visit YouTube each month, over 6 billion hour of video are watched each month 100 h of new videos are uploaded every minute YouTube is also highly globalized: it is localized in 61 countries and across 61 languages 80% of YouTube traffic comes from outside the US

18.1.2 Online Social Networking
Online social networking services provide an Internet-based platform to connect people with social relations, e.g., friends, classmates, and colleagues in the real world, or people who simply share common interests. Facebook (founded in 2004), is one of the dominating online social networking services on the Internet. It provides users with a platform to connect with friends, by updating status, uploading photos, commenting and “liking” other’s posts, etc. As of November 2013, Facebook had 1.19 billion active users worldwide and 728 million of them log onto Facebook on a daily basis. Currently, there are 4.5 billion likes and comments generated each day, together with 300 million photos uploaded.

Another important social networking website, Twitter, is a representative of microblog, a simpler but much faster version of blog. It allows users to send text based posts, called tweets, of up to 140 characters. Although short, the tweets can link to richer contents such as images and videos. By following friends or interested accounts, such as news providers, celebrities, brands, and organizations, Twitter users can obtain real- time notifications, and spread the posts by a retweet mechanism.

Both Facebook and Twitter open its API for developers to build thousands of applications and games, which makes it more enjoyable. Both Facebook and Twitter support the sharing and propagation of such media objects as pictures, music, and video among friends, although the media content may be hosted by external sites. Recently, Twitter has also begun offering the Vine service, which, available exclusively for mobile users, enables them to create and post video clips. Vine user can create a short video clip up to six seconds long while recording through Vine’s in-app camera. The clip can then be shared through Twitter, Facebook, or other social networking services.

18.2 User-Generated Media Content Sharing
Social media has greatly changed mechanisms of content generation and access, and also brings unprecedented (لا مثيل له) challenges to server and network management. Understanding the features of social media services is thus crucial to traffic engineering and to the sustainable development of these new generation of services. So, what features do these media has (using YouTube as a representative)?

18.2.1 YouTube Video Format and Meta-data
YouTube’s video playback technology is based on Adobe’s Flash Player, which allows YouTube to display videos with quality comparable to well-established video playback technologies (such as Windows Media Player, QuickTime, and Realplayer). YouTube accepts uploaded videos in many formats, which are converted to the .FLV (Adobe Flash Video) format after uploading. It is well recognized that the use of a uniform and easily playable format is critical to the success of YouTube. YouTube used the H.263 video codec earlier, and introduced “high quality” format with the H.264 codec for better viewing quality in late 2008.

18.2.1 YouTube Video Format and Meta-data
YouTube assigns each video a distinct 11-digit ID composed of the characters: 0–9, a–z, A–Z, - (hyphen) , _ (underscore) Each video contains the following intuitive meta-data: (See Table18.1) video I D, uploader, date added, categor y, length, number of views, number of ratings, number of comments, and a list of related videos.

18.2.2 Characteristics of YouTube Video
While sharing similar characteristics, many of the video statistics of these traditional media servers are quite different from YouTube-like sites, e.g., the video length distribution and user access pattern. More importantly, these videos are generally movies and TV programs that are not generated by ordinary users, nor are they connected by social relations.

Video Category: In YouTube, one of 15 categories is selected by a user when uploading a video. Table 18.2 lists the number and percentage of all the categories, from a dataset of 5 million videos crawled over a 1.5-year span The most popular category is “Entertainment,” at about 25.4%, and the second is “Music,” at about 24.8%. These two categories of videos constitute half of the entire YouTube video collection, suggesting that YouTube is mainly an entertainment- like site. “Unavailable” are videos set to private, or videos that have been flagged as inappropriate content “Removed” are videos that have been deleted

Video Length: The length of YouTube videos is the most distinguishing difference from traditional video contents. Whereas most traditional servers contain a significant portion of long videos, typically 1–2 hour movies (e.g., HPLabs Media Server and OnlineTVRecorder , YouTube mostly comprises short video clips, and 98.0% of the videos’ lengths are within 600 s. Although YouTube has increased its initial 10 min length limit to 15 min and allows certain users to upload videos of unlimited length, most of the user-generated videos remain quite short in nature.

Access Patterns: It has been found that the 10% top popular videos account for nearly 80% of views, indicating that YouTube is highly skewed toward popular videos. Yet YouTube users tend to abort the playback very soon, with 60%of videos being watched for less than 20% of their duration. Furthermore, only 10% of the videos are watched again on the following day YouTube users’ viewing behaviors are highly diversified, affected by both the video quality as well as their social relations.

18.2.3 Small-World in YouTube Videos
YouTube is a prominent (بارز) social media service: there are communities and groups in YouTube, and thus videos are no longer independent of each other. There is a strong correlation between the number of views of a video and that of its top related videos, and this also provides more diversity on video views, helping users discover more videos of their own interest rather than the popular videos only. The small-world network phenomenon is probably the most interesting characteristic for social networks.

18.2.4 YouTube from a Partner’s View
YouTube displays advertisements on the web pages to monetize videos, and this has been the main source of YouTube’s revenue YouTube has introduced a YouTube Partner Program, which has largely improved the quality of YouTube videos, and has further increased YouTube’s revenue.

18.3 Media Propagation in Online Social Networks
The new generation of online social network services, such as Facebook or Twitter, directly connect people through cascaded relations, and information thus spreads much faster and more extensively than through conventional web portals or newsgroup services, As an example, Twitter first reported Tiger Woods’ car crash 30 min before CNN. With the development in broadband access and data compression, video has become an important type of object spreading over social networks, beyond earlier simple text or image object sharing

End of Chapter 18

Fundamentals of Multimedia

Similar presentations

Presentation on theme: "Fundamentals of Multimedia"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fundamentals of Multimedia

Similar presentations

Presentation on theme: "Fundamentals of Multimedia"— Presentation transcript:

Similar presentations

About project

Feedback