Presentation is loading. Please wait.

Presentation is loading. Please wait.

I ούνιος 6, 2006 The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ.

Similar presentations

Presentation on theme: "I ούνιος 6, 2006 The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ."— Presentation transcript:

1 I ούνιος 6, 2006 The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ

2 2 Outline MPEG-7 motivation and scope Visual Descriptors (color, texture, shape) MPEG-7 retrieval evaluation criterion Similarity measures and MPEG-7 visual descriptors Building MPEG-7 Descriptors and Descriptors Schemes with Description Definition Language MPEG-7 VXM current state Towards MPEG-7 Query Format Framework (Queries and visual descriptor tools employed by the queries) Summary

3 3 Proliferation of audio-visual content MPEG-7 motivation and design scenarios (possible queries) Music/audio: play a few notes and return music with similar music/audio Images/graphics: draw a sketch and return images with similar graphics Text/keywords: find AV material with subject corresponding to a keyword Movement: describe movements and return video clips with the specified temporal and spatial relations Scenario: describe actions and return scenarios where similar actions take place Standardize multimedia metadata descriptions (facilitate multimedia content-based retrieval) for various types of audiovisual information Consumer content news sports Scientific content Digital art galleries Recorded material

4 4 - How to extract descriptions(feature extraction, indexing process,annotation & authoring tools,...) Scope of the Standard Description Production (extraction) Description Consumption Standard Description Normative part of MPEG-7 standard - How to use descriptions (search engine, filtering tool, retrieval process, browsing device,...) - The similarity between contents ->The goal is to define the minimum that enables interoperability. * MPEG-7 does not specify (non normative parts of MPEG-7):

5 5 Information flow

6 6 Color Descriptors Dominant Color Scalable Color Color Layout Color Structure GoF/GoP Color Texture Descriptors Homogeneous Texture Texture Browsing Edge Histogram Shape Descriptors Region Shape Contour Shape 3D Shape Visual Descriptors Localization Region Locator Spatio-Temporal Locator Other Face Recognition Motion Descriptors for Video Camera Motion Motion Trajectory Parametric Motion Motion Activity (Normative, basic, for localization)

7 7 Color Descriptors Constrained color spaces: ->Scalable Color Descriptor uses HSV ->Color Structure Descriptor uses HMMD Color Descriptors Dominant Color Scalable Color - HSV space Color Structure -HMMD space Color Layout -YCbCr space GroupOfFrames/ Pictures Color Space: - R, G, B - Y, Cr, Cb - H, S, V - Monochrome - Linear transformation of R, G, B - HMMD

8 8 Scalable Color Descriptor (CSD) A color histogram in HSV color space Encoded by Haar Transform Feature vector: {NoCoef, NoBD, Coeff[..], CoeffSign[..]}

9 9 SCD extraction to 4bits/bin to 11bits/bin Nbits/bin (#bin<256)

10 10 GoF/GoP Color Descriptor Histograms Aggregation methods: –Average..but sensitivity to outliers (lighting changes, occlusion, text overlays) –Median..increased comp. complexity for sorting –Intersection..differs: a “least common” color trait viewpoint Extends Scalable Color Descriptor for a video segment or a group of pictures (joint color hist. is then possessed as CSD- Haar transform encoding) Extraction

11 11 GoF/GoP Color Descriptor Applications: Browsing a large collection of images to find similar images - > Use Histogram Intersection as a color similarity measure for clustering a collection of images - > Represent each cluster by GoP descriptor

12 12 Dominant Color Descriptor (DCD) Clustering colors into a small number of representative colors (salient colors) F = { {c i, p i, v i }, s} c i : Representative colors p i : Their percentages in the region v i : Color variances s : Spatial coherency

13 13 DCD Extraction (based on Lloyd gen. algorithm) c i centroid of cluster ; x(n) color vector at pixel; v(n) perceptual weight for pixel. +spatial coherency: Average number of connecting pixels of a dominant color using 3x3 masking window H.V.P more sensitive to smooth regions

14 14 ontentBasedVideoRetrieval/CBVR/Do minant/index.html ontentBasedVideoRetrieval/CBVR/Do minant/index.html

15 15 Color Layout Descriptor (CLD) Clustering the image into 64 (8x8) blocks Deriving the average color of each block (or using DCD) Applying (8x8)DCT and encoding Efficient for –Sketch-based image retrieval –Content Filtering using image indexing … …

16 16 If the time domain data is smooth (with little variation in data) then frequency domain data will make low frequency data larger and high frequency data smaller. -> derived average colors are transformed into a series of coefficients by performing DCT ( data in time domain - > data in frequency domain ). -> A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD ( large quantization step in quantizing AC coef / small quantization step in quantizing DC ). ->The color space adopted for CLD is YCrCb. CLD extraction F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}

17 17 Color Structure Descriptor (CSD) Scanning the image by an 8x8 struct. element Counting the number of blocks containing each color Generating a color histogram (HMMD/4CSQ operating points)

18 18 CSD extraction If Then sub sampling factor p is given by: F = {colQuant, Values[m]}

19 19 CSD scaling

20 20 Texture Descriptors Homogenous Texture Descriptor Non-Homogenous Texture Descriptor (Edge Histogram) Texture Browsing

21 21 Homogenous Texture Descriptor (HTD) Partitioning the frequency domain into 30 channels (modeled by a 2D-Gabor function) Computing the energy and energy deviation for each channel Computing mean and standard variation of frequency coefficients - > F = {f DC, f SD, e 1,…, e 30, d 1,…, d 30 } An efficient implementation: –Radon transform followed by Fourier transform

22 22 HTD Extraction – How to get 2-D frequency layout following the HVS 2-D image f(x,y)  1D P (R, θ) Radon transform  1D F(P (R, θ))  Resulted sampling grid in polar coords

23 23 - > 2D-Gabor Function deployed to define Gabor filter banks It is a Gaussian weighted sinusoid It is used to model individual channels Each channel filters a specific type of texture HTD Extraction - Data sampling in feature channel

24 24 Radon Transform Transforms images with lines into a domain of possible line parameters Each line will be transformed to a peak point in the resulted image

25 25 HTD properties One can perform Rotation invariance matching Intensity invariance matching (f CD removed from the feature vector) Scale-Invariant matching F = {f DC, f SD, e 1,…, e 30, d 1,…, d 30 }

26 26 Texture Browsing Descriptor -> Same sp. filtering procedure as the HTD.. Scale and orientation selective band-pass filters regularity (periodic to random) Coarseness (grain to coarse) Directionality (/30 0 )  ->the texture browsing descriptor can be used to find a set of candidates with similar perceptual properties and then use the HTD to get a precise similarity match list among the candidate images. e.g look for textures that are very regular and oriented at 30 0

27 27 Edge Histogram Descriptor (EHD) Represents the spatial distribution of five types of edges –vertical, horizontal, 45°, 135°, and non- directional Dividing the image into 16 (4x4) blocks Generating a 5-bin histogram for each block It is scale invariant   Retain strong edges by thresholding canny edge operator … F = {BinCounts[k]},k=80

28 28 EHD extraction Basic (80 bins) Extended (150 bins) +13 clusters for semi-global basic Semi- global Egde map image using “Canny” edge operator.

29 29 ETD valuation Cannot be used for object-based image retrieval Th edge if set to 0 ETD applies for binary edge images (sketch-based retrieval) Extended HTD achieves better results but does not exhibits rotation invariant property

30 30 Shape Descriptors Region-based Descriptor Contour-based Shape Descriptor 2D/3D Shape Descriptor 3D Shape Descriptor

31 31 Region-based Descriptor (RBD) Expresses pixel distribution within a 2-D object region Employs a complex 2D-Angular Radial Transformation (ART) m = 0,..12 n = 0,..3 F = {MagnitudeOfART[k]},k=nxm

32 32 Region-based Descriptor (2) Applicable to figures (a) – (e) Distinguishes (i) from (g) and (h) (j), (k), and (l) are similar Advantages: Describes complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching

33 33 Contour-Based Descriptor (CBD) It is based on Curvature Scale-Space representation

34 34 Curvature Scale-Space Finds curvature zero crossing points of the shape’s contour (key points) Reduces the number of key points step by step, by applying Gaussian smoothing The position of key points are expressed relative to the length of the contour curve

35 35 CBD Extraction Location x CSS of curvature zero- crossing points Filtering pass y css Repetitive smoothing of X and Y contour coordinates by the low- pass kernel (0.25, 0,5, 0,25) until the contour becomes convex F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ], HighestPeakY, peakX[k], peakY[k]}

36 36 CBD Applicability Applicable to (a) Distinguishes differences in (b) Find similarities in (c) - (e) Advantages: Captures the shape very well Robust to the noise, scale, and orientation It is fast and compact

37 37 Comparison (RB/CB descriptors) Blue: Similar shapes by Region-Based Yellow: Similar shapes by Contour-Based

38 38 How MPEG-7 compare descriptors? ANMRR (average modified retrieval rank): -normalized measures that take into account different sizes of ground truth sets and the actual ranks obtained from the retrieval were defined -> retrievals that miss items are assigned a penalty. Traditional metric

39 39 Similarity between features Typically descriptors: multidimensional vectors (of low level features) Similarity of two images in the vector feature space: – the range query: all the points within a hyperrectangle aligned with the coordinate axes – the nearest-neighbour or within-distance (α−cut) query: a particular metric in the feature space – dissimilarity between statistical distributions: the same metrics or specific measures

40 40 reDemo/Demo/client/M7TextureDe mo.html reDemo/Demo/client/M7TextureDe mo.html An example of CBIR system using HTD performing range query and NN query

41 41 Criticism on MPEG-7 distance measures MPEG-7 adopts feature vector space distances based on geometric assumptions of descriptor space, e.g..but these quantitative measures (low-level information) do not fit ideally with human similarity perception ->researchers from other areas have developed alternative predicate-based models (descriptors are assumed to contain just binary elements in opposition to continuous data) which express the existence of properties and express high level information See “Pattern difference” : K:NofPredicates in the data vectors X i, X j b: property exists in X i c: property exists in X j

42 42 Vector Space Distances

43 43 Distances/Similarity measures

44 44 How to build and deploy an MPEG-7 Description A description A Description Scheme (structure). A set of Descriptor Values (instantiation of a Descriptor for a given data set) + MPEG-7 Description Tools are a library of standardized Descriptions and Description Schemes Adopting the XML Schema as the basis for the MPEG-7 DDL and the resulting XML-compliant instances (Descriptions in MPEG-7 textual format) eases interoperability by using a common, generic and powerful (+ extensible) representation format in DDLanguage

45 45 How that works Description Definition Language: ->XML Schema (flexibility) - XMLS struct.lang.components - XMLS datatype lang.components - mpeg-7 spesific extentions + - >Binary version (efficiency) Mpeg7 support for vectors, matrices and typed references Text format BiM format mix (XML)

46 46 A DDL example (instantiation) schema ”CNN 6 oclock News” David James 1999 CNN This permits VideoDoc elements, as well as types derived from VideoDoc to be used as a child of VideoCatalogue, e.g., NewsDoc instance

47 47 Descriptions enabled by the MPEG-7 tools Perceptual Descriptions: - content’s spatio- temporal structure - info on low-level features - semantic info related to the reality captured by the content Archival-oriented Descriptions: -content’s creation/production - info on using the content - info on storing and representing the content Additional info for organizing, managing and accessing the content: - How objs are related and gathered in collections -summaries/variations/tran scoding to support efficient browsing - User interaction info Organization/Naviga- tion/Access/ User Interaction Tools Content description Tools Content management Tools

48 48 Type hierarchy for top levels elements

49 49... T00:00:00 PT2M …… …

50 50 What DS to choose..? MPEG-7 provides DSs for description of the structure and semantics of AV content + content management Cont.Manag. Info can be attached to individual Segments

51 51 Viewpoint of the structure: Segments

52 52 Structure description Video Segment Segment decomposition Time Color Motion Texture Shape Annotation Time Mosaic Annotation Moving region Relation Link above Video Segments Moving regions Segment decomposition Segments decomposition

53 53 Segment Decomposition time connectivity

54 54 Content structural aspects (Segment DS tree) Annotate the whole image with StillRegion Spatial segmentation at different levels  Among different regions we could use SegmentRelationship description tools

55 55 Content structural aspects Temporal segments (Segment Relationship DS graph)

56 56 Viewpoint of conceptual notions

57 57 Content Semantic aspects (SemanticGraph)

58 58 Example of Structure-Semantic Link DS

59 59 Content abstraction aspects (CoAbstr)- Hierarchical summary of a video f0f0 f0f0 f0f0 f 00 f 01 f 02 - > enables rapid browsing, navigation (also sequential summary)

60 60 (CoAbstr)-Partitions and decompositions (ViewDecomposition DS) Frequency-space graph

61 61 (CoAbstr) Content Variation Universal Multimedia Access: Adapt delivery to network and terminal characteristics

62 62 CoAbstr – A collection (Collection StructureDS) - >groups segments, events, or objects into collection clusters and specifies properties that are common to the elements: The CollectionStructure DS describes also statistics and models of the attribute values of the elements, such as a mean color histogram for a collection of images. The CollectionStructure DS also describes relationships among collection clusters.

63 63 Reference Software: the XM XM implements –MPEG-7 Descriptors (Ds) –MPEG-7 Description Schemes (DSs) –Coding Schemes –DDL extraction  <--search and retrieval <--trasnscoding description filtering 

64 64 Beyond mpeg-7 version 1 (D&DS in VXM) ColorTemperature: This descriptor specifies the perceptual temperature feeling of illumination color in an image for browsing and display preference control purposes (user friendly). Four perceptual temperature browsing categories are provided; hot, warm, moderate, and cool. Each category is used for browsing images based upon its perceptual meaning. – uses dominant color descriptor Illumination Invariant Color: wraps the color descriptors. One or more color descriptors processed by the illumination invariant method can be included in this descriptor. Shape Variation: can describe shape variations in terms of Shape Variation Map and the statistics of the region shape description of each binary shape image in the collection. Shape Variation Map consists of StaticShapeVariation and DynamicShapeVariation. The former corresponds to 35 quantized ART coefficients on a 2-dimensional histogram of group of shape images and the latter to the inverse of the histogram except the background. Media-centric description schemes: Three visual description schemes are designed to describe several types of visual contents. The StillRegionFeatureType contains several elementary descriptors to describe the characteristics of arbitrary shaped still regions.

65 65 Visual CE current phase CE explore new technologies on identifying original images and their modified versions (N-1 modified versions), focused on the accuracy and robustness of identification -> robustness is measured as the accuracy (HitRatio = k/(N)) separately calculated with each level of modification Modifications: Brightness Size reduction Color to Monochrome JPEG compr. with varying quality factors Color reduction Crop Histogram Equalization Blur Geometric Transformation

66 66 Towards MPEG-7 Query Format ->Though, the interface to support queries in an MPEG-7 database is not yet supported, requirements have been drafted Output Query Format Client Application MPEG-7 Database Input Query Format Query Management Tools e.g -query by textual description -Combinations of query conditions -spesification of the structure of the result set e.g. structure of the response containing the resulting set e.g -spesification of the exceptions -relevant feedback

67 67 Basic search functionalities may include: Query by Description (the client application provides possible query criteria)

68 68 Query by example Query Segment-based search (selecting subparts or ROI to refine the search criteria) =>

69 69 Compositional search :from a globalization page the user may select a number of interesting (or relevant) images to refine the search criteria + =>

70 70 Current state of MPEG-7 VXM in CBIR Query by modified sketch [segmentation/simplification by assigning a repres. color in each segment/ modification] Query within ROI Situation-based clustering (Simple clustering/ Clustering on Visual semantic Hints) Category-based clustering (local-concept lexicon: multiple low-level features of locasl regions used in learning and detecting local concepts)

71 71 Query within ROI uses EHD and CLD for describing local properties - > example: photos search by matching the background regions only

72 72 Situation based clustering based on visual semantic hints (visual sensation-vs) Colorfulness (CoF) hint: degree of v.s. according to the purity of colors ->Utilizes ScalableColorDescriptor

73 73 Situation based clustering based on visual semantic hints (visual sensation-vs) (2) Color Coherence (CoC) Hint: degree of v.s. according to spatial coherency of colors -> utilizes DominantColorDescriptor

74 74 Situation based clustering based on visual semantic hints (visual sensation-vs) (3) Level of Detail (LoD) Hint: degree of a v.s for objects appearing more or less detailed -> defines a relative compression ratio per pixel based on the JPEG compression the photo has gone through =

75 75 Situation based clustering based on visual semantic hints (visual sensation-vs) (4) Homogeneous Texture (HoT) Hint: degree of a v.s according homogeneous texture on photo -> expresses texture regularity using TextureBrowsingDescriptor Heterogeneous Texture (HeT) Hint: degree of a v.s. on how continoous or strong the boundaries are on photo -> utilizes EdgeHistogramDescriptor

76 76 Category-based clustering  local-concept lexicon: multiple low-level features of local regions used in learning and detecting local concepts, once the local concepts have been built, confidence values for each sub-region are measured for all local concepts

77 77 MPEG relative activities Functionalities described before are especially useful for the developer of MPEG-A Photo Player:  offers a standardized solution for the carriage of images and associated metadata, to facilitate simple and fully interoperable exchange across different devices and platforms. - >The set of metadata includes MPEG-7 visual content descriptions, as well as acquisition-based metadata (such as date, time and camera settings). This allows compliant devices to support new, content- enhanced functionality, such as intelligent browsing, content-based search or automatic categorization

78 78 Summary MPEG-7 Standard - MPEG-7 visual and content structure description tools (Ds & DSs using DDL) –MPEG-7 requirements on Queries Format –MPEG-7 VXM current phase (descriptors and CBIR)  Multimedia segmentation, understanding, and searching, among others, are still a challenge

79 79 The end. Most of the pictures or their basic ideas are taken from the listed papers and web pages.

Download ppt "I ούνιος 6, 2006 The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ."

Similar presentations

Ads by Google