Presentation is loading. Please wait.

Presentation is loading. Please wait.

Video Processing Wen-Hung Liao 6/2/2005

Similar presentations


Presentation on theme: "Video Processing Wen-Hung Liao 6/2/2005"— Presentation transcript:

1 Video Processing Wen-Hung Liao 6/2/2005
5/15/2005

2 Outline Basics of Video Video Processing Summary
Video coding/compression/conversion Digital video production Video special effects Video content analysis Summary

3 Basics of Video Component video Composite video Digital video

4 Component Video Higher-end video systems make use of three separate video signals for the red, green, and blue image planes. Each color channel is sent as a separate video signal. Most computer systems use Component Video, with separate signals for R, G, and B signals. For any color separation scheme, Component Video gives the best color reproduction since there is no crosstalk between the three channels. This is not the case for S-Video or Composite Video, discussed next. Component video, however, requires more bandwidth and good synchronization of the three components.

5 Composite Video Color (chrominance) and intensity (luminance) signals are mixed into a single carrier wave. Chrominance is a composition of two color components (I and Q, or U and V). In NTSC TV, e.g., I and Q are combined into a chroma signal, and a color subcarrier is then employed to put the chroma signal at the high-frequency end of the signal shared with the luminance signal. The chrominance and luminance components can be separated at the receiver end and then the two color components can be further recovered. When connecting to TVs or VCRs, Composite Video uses only one wire and video color signals are mixed, not sent separately. The audio and sync signals are additions to this one signal. Since color and intensity are wrapped into the same signal, some interference between the luminance and chrominance signals is inevitable.

6 S-Video As a compromise, (Separated video, or Super-video, e.g., in S-VHS) uses two wires, one for luminance and another for a composite chrominance signal. As a result, there is less crosstalk between the color information and the crucial gray-scale information. The reason for placing luminance into its own part of the signal is that black-and-white information is most crucial for visual perception. In fact, humans are able to differentiate spatial resolution in gray-scale images with a much higher acuity than for the color part of color images. As a result, we can send less accurate color information than must be sent for intensity information | we can only see fairly large blobs of color, so it makes sense to send less color detail.

7 Digital Video The advantages of digital representation for video are many. For example: Video can be stored on digital devices or in memory, ready to be processed (noise removal, cut and paste, etc.), and integrated to various multimedia applications; Direct access is possible, which makes nonlinear video editing achievable as a simple, rather than a complex, task; Repeated recording does not degrade image quality; Ease of encryption and better tolerance to channel noise.

8 Chroma Subsampling Since humans see color with much less spatial resolution than they see black and white, it makes sense to decimate the chrominance signal. Interesting (but not necessarily informative!) names have arisen to label the different schemes used. To begin with, numbers are given stating how many pixel values, per four original pixels, are actually sent: The chroma subsampling scheme 4:4:4 indicates that no chroma subsampling is used: each pixel's Y, Cb and Cr values are transmitted, 4 for each of Y, Cb, Cr.

9 Chroma Subsampling (2) The scheme 4:2:2 indicates horizontal subsampling of the Cb, Cr signals by a factor of 2. That is, of four pixels horizontally labeled as 0 to 3, all four Ys are sent, and every two Cb's and two Cr's are sent, as (Cb0, Y0)(Cr0,Y1)(Cb2, Y2)(Cr2, Y3)(Cb4, Y4), and so on (or averaging is used). The scheme 4:1:1 subsamples horizontally by a factor of 4. The scheme 4:2:0 subsamples in both the horizontal and vertical dimensions by a factor of 2.

10 Chroma Subsampling (3)

11 RGB/YUV Conversion RGB to YUV Conversion Y = (0.257 * R) + (0.504 * G) + (0.098 * B) + 16 Cr = V = (0.439 * R) - (0.368 * G) - (0.071 * B) + 128 Cb = U = -(0.148 * R) - (0.291 * G) + (0.439 * B) + 128 YUV to RGB Conversion B = 1.164(Y - 16) (U - 128) G = 1.164(Y - 16) (V - 128) (U - 128) R = 1.164(Y - 16) (V - 128)

12 Video Coding Standards
MPEG Standards (1, 2,4,7,21) MPEG-1: VCD MPEG-2: DVD MPEG-4: video objects MPEG-7: Multimedia database MPEG-21: framework H.26x series (H.261,H.263,H.264): video conferencing

13 Digital Video Production
Tools: Adobe Premiere, After Effects,… Resources: Examples:

14 Video Special Effects Examples:
EffectTV: FreeFrame:

15 Types of Special Effects
Applying to the whole image frame Applying to part of the image (edges, moving pixels,…) Applying to a collection of frames Applying to detected areas Overlaying virtual objects: at pre-determined locations in response to user’s position

16 Video Content Analysis
Event detection For indexing/searching To obtain high-level semantic description of the content.

17 Image Databases Problem: accessing and searching large databases of images, videos and music Traditional solutions: file IDs, keywords, associated text. Problems: can’t query based on visual or musical properties depends on the particular vocabulary used doesn’t provide queries by example time consuming Solution: content-based retrieval using automatic analysis tools (see

18 Retrieval of images by similarity
Components: Extraction of features or image signatures and efficient representation and storage A set of similarity measures A user interface for efficient and ordered representation of retrieved images and to support relevance feedback Considerations Many definitions of similarity are possible User interface plays a crucial role Visual content-based retrieval is best utilized when combined with traditional search

19 Image features for similarity definition
Color similarity Similarity: e.g., “distance” between color histograms Should use perceptually meaningful color spaces (HSV, Lab...) Should be relatively independent of illumination (color constancy) Locality:“find a red object such as this one Texture similarity Texture feature extraction (statistical models) Texture qualities: directionality, roughness, granularity...

20 Shape Similarity Must distinguish between similarity between actual geometrical 2-D shapes in the image and underlying 3-D shape Shape features: circularity, eccentricity, principal axis orientation... Spatial similarity Assumes images have been (automatically or manually) segmented into meaningful objects (symbolic image) Considers the spatial layout of the objects in the scene Object presence analysis Is this particular object in the image?

21 Main components of retrieval system
Database population: images and videos are processed to extract features (color, texture, shape, camera and object motion) Database query: user composes query via graphic user interface. Features are generated from graphical query and input to matching engine Relevance feedback: automatically adjusts existing query using information fed back by user about relevance of previously retrieved objects

22 Video parsing and representation
Interaction with video using conventional VCR-like manipulation is difficult - need to introduce structural video analysis Video parsing Temporal segmentation into elemental units Compact representation of elemental unit

23 Temporal segmentation
Fundamental unit of video manipulation: video shots Types of transition between shots: Abrupt shot change Fades: slow change in brightness Dissolve Wipe: pixels from second shots replace those of previous shot in regular patterns Other factors of image change: Motion, including camera motion and object motion Luminosity changes and noise

24 Representation of Video
Video database population has three major components: Shot detection Representative frame creation for each shot Derivation of layered representation of coherently moving structures/objects A representative frame (R-frame) is used for: population: R-frame is treated as a still image for representation query: R-frames are basic units initially returned in video query Choice of R-frame: first - middle - last frame in video shot sprite built by seamless mosaicing all frames in a shot

25 Video soundtrack analysis
Image/sound relationships are critical to the perception and understanding of video content. Possibilities: Speech, music and Foley sound, detection and representation Locutor identification and retrieval Word spotting and labeling (speech recognition) A possible query could be: “find the next time this locutor is again present in this soundtrack” Video scene analysis shots per hours in typical movies One level above shot: sequence or scene (a series of consecutive shots constituting a unit from the narrative point of view)


Download ppt "Video Processing Wen-Hung Liao 6/2/2005"

Similar presentations


Ads by Google