Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia Content Description Interface

Similar presentations


Presentation on theme: "Multimedia Content Description Interface"— Presentation transcript:

1 Multimedia Content Description Interface
The MPEG-7 Multimedia Content Description Interface Αναστασία Μπολοβίνου, Υ/Δ Ινστιτούτου Πληροφορικής και Τηλεπικοινωνιών Ε.Κ.Ε.Φ.Ε ΔΗΜΟΚΡΙΤΟΣ

2 Outline MPEG-7 motivation and scope
Visual Descriptors (color, texture, shape) MPEG-7 retrieval evaluation criterion Similarity measures and MPEG-7 visual descriptors Building MPEG-7 Descriptors and Descriptors Schemes with Description Definition Language MPEG-7 VXM current state Towards MPEG-7 Query Format Framework (Queries and visual descriptor tools employed by the queries) Summary

3 MPEG-7 motivation and design scenarios (possible queries)
sports Music/audio: play a few notes and return music with similar music/audio Images/graphics: draw a sketch and return images with similar graphics Text/keywords: find AV material with subject corresponding to a keyword Movement: describe movements and return video clips with the specified temporal and spatial relations Scenario: describe actions and return scenarios where similar actions take place Standardize multimedia metadata descriptions (facilitate multimedia content-based retrieval) for various types of audiovisual information news Scientific content Consumer content Proliferation of audio-visual content Digital art galleries Recorded material

4 Scope of the Standard Description Production (extraction) Standard Description Description Consumption Normative part of MPEG-7 standard * MPEG-7 does not specify (non normative parts of MPEG-7): - How to extract descriptions(feature extraction, indexing process,annotation & authoring tools,...) - How to use descriptions (search engine, filtering tool, retrieval process, browsing device, ...) - The similarity between contents ->The goal is to define the minimum that enables interoperability.

5 Information flow

6 Visual Descriptors Color Descriptors Dominant Color Scalable Color
(Normative, basic, for localization) Color Descriptors Dominant Color Scalable Color Color Layout Color Structure GoF/GoP Color Texture Descriptors Homogeneous Texture Texture Browsing Edge Histogram Shape Descriptors Region Shape Contour Shape 3D Shape Localization Region Locator Spatio-Temporal Locator Motion Descriptors for Video Camera Motion Motion Trajectory Parametric Motion Motion Activity Other Face Recognition

7 Color Descriptors Color Descriptors Dominant Color Scalable Color
- HSV space Color Structure -HMMD space Color Layout -YCbCr space GroupOfFrames/ Pictures • Color Space: - R, G, B - Y, Cr, Cb - H, S, V - Monochrome - Linear transformation of R, G, B - HMMD Constrained color spaces: ->Scalable Color Descriptor uses HSV ->Color Structure Descriptor uses HMMD

8 Scalable Color Descriptor (CSD)
A color histogram in HSV color space Encoded by Haar Transform Feature vector: {NoCoef, NoBD, Coeff[..], CoeffSign[..]}

9 SCD extraction to 11bits/bin to 4bits/bin Nbits/bin (#bin<256)

10 GoF/GoP Color Descriptor
Extends Scalable Color Descriptor for a video segment or a group of pictures (joint color hist. is then possessed as CSD- Haar transform encoding) Extraction Histograms Aggregation methods: Average ..but sensitivity to outliers (lighting changes, occlusion, text overlays) Median ..increased comp. complexity for sorting Intersection ..differs: a “least common” color trait viewpoint

11 GoF/GoP Color Descriptor
Applications: Browsing a large collection of images to find similar images - > Use Histogram Intersection as a color similarity measure for clustering a collection of images - > Represent each cluster by GoP descriptor

12 Dominant Color Descriptor (DCD)
Clustering colors into a small number of representative colors (salient colors) F = { {ci, pi, vi}, s} ci : Representative colors pi : Their percentages in the region vi : Color variances s : Spatial coherency

13 DCD Extraction (based on Lloyd gen. algorithm)
+spatial coherency: Average number of connecting pixels of a dominant color using 3x3 masking window ci centroid of cluster ; x(n) color vector at pixel; v(n) perceptual weight for pixel . H.V.P more sensitive to smooth regions

14 http://debut. cis. nctu. edu

15 Color Layout Descriptor (CLD)
Clustering the image into 64 (8x8) blocks Deriving the average color of each block (or using DCD) Applying (8x8)DCT and encoding Efficient for Sketch-based image retrieval Content Filtering using image indexing . . . . . . . .

16 CLD extraction > derived average colors are transformed into a series of coefficients by performing DCT (data in time domain - > data in frequency domain). > A few low-frequency coefficients are selected using zigzag scanning and quantized to form a CLD (large quantization step in quantizing AC coef / small quantization step in quantizing DC ). ->The color space adopted for CLD is YCrCb. If the time domain data is smooth (with little variation in data) then frequency domain data will make low frequency data larger and high frequency data smaller. F ={CoefPattern, YDCCoef,CbDCCoef,CrDCCoef,YACCoef, CbACCoef, CrACCoef}

17 Color Structure Descriptor (CSD)
Scanning the image by an 8x8 struct. element Counting the number of blocks containing each color Generating a color histogram (HMMD/4CSQ operating points) It takes into account the colors in the local neighborhood of pixels instead of considering each pixel separately.

18 CSD extraction F = {colQuant, Values[m]} If
Then sub sampling factor p is given by: F = {colQuant, Values[m]}

19 CSD scaling

20 Texture Descriptors Homogenous Texture Descriptor
Non-Homogenous Texture Descriptor (Edge Histogram) Texture Browsing

21 Homogenous Texture Descriptor (HTD)
Partitioning the frequency domain into 30 channels (modeled by a 2D-Gabor function) Computing the energy and energy deviation for each channel Computing mean and standard variation of frequency coefficients - > F = {fDC, fSD, e1,…, e30, d1,…, d30} An efficient implementation: Radon transform followed by Fourier transform

22 HTD Extraction – How to get 2-D frequency layout following the HVS
2-D image f(x,y) HTD Extraction – How to get 2-D frequency layout following the HVS Radon transform 1D P (R, θ) 1D F(P (R, θ)) Resulted sampling grid in polar coords

23 - > 2D-Gabor Function deployed to define Gabor filter banks
HTD Extraction - Data sampling in feature channel - > 2D-Gabor Function deployed to define Gabor filter banks It is a Gaussian weighted sinusoid It is used to model individual channels Each channel filters a specific type of texture

24 Radon Transform Transforms images with lines into a domain of possible line parameters Each line will be transformed to a peak point in the resulted image

25 HTD properties F = {fDC, fSD, e1,…, e30, d1,…, d30}
One can perform Rotation invariance matching Intensity invariance matching (fCD removed from the feature vector) Scale-Invariant matching F = {fDC, fSD, e1,…, e30, d1,…, d30} ..and estimate AVRR for 7 kinds of data sets ->only slight increase of AVRR when double layer was used

26 Texture Browsing Descriptor
-> Same sp. filtering procedure as the HTD.. regularity (periodic to random) e.g look for textures that are very regular and oriented at 300 Scale and orientation selective band-pass filters Coarseness (grain to coarse) Directionality (/300) ->the texture browsing descriptor can be used to find a set of candidates with similar perceptual properties and then use the HTD to get a precise similarity match list among the candidate images.

27 Edge Histogram Descriptor (EHD)
Represents the spatial distribution of five types of edges vertical, horizontal, 45°, 135°, and non-directional Dividing the image into 16 (4x4) blocks Generating a 5-bin histogram for each block It is scale invariant Retain strong edges by thresholding canny edge operator F = {BinCounts[k]} ,k=80

28 EHD extraction . Extended (150 bins) Basic (80 bins) global basic
Semi- global . +13 clusters for semi-global Egde map image using “Canny” edge operator

29 ETD valuation Cannot be used for object-based image retrieval
Thedgeif set to 0 ETD applies for binary edge images (sketch-based retrieval) Extended HTD achieves better results but does not exhibits rotation invariant property

30 Shape Descriptors Region-based Descriptor
Contour-based Shape Descriptor 2D/3D Shape Descriptor 3D Shape Descriptor

31 Region-based Descriptor (RBD)
Expresses pixel distribution within a 2-D object region Employs a complex 2D-Angular Radial Transformation (ART) F = {MagnitudeOfART[k]} ,k=nxm m = 0, ..12 n = 0, ..3

32 Region-based Descriptor (2)
Applicable to figures (a) – (e) Distinguishes (i) from (g) and (h) (j), (k), and (l) are similar Advantages: Describes complex shapes with disconnected regions Robust to segmentation noise Small size Fast extraction and matching

33 Contour-Based Descriptor (CBD)
It is based on Curvature Scale-Space representation

34 Curvature Scale-Space
Finds curvature zero crossing points of the shape’s contour (key points) Reduces the number of key points step by step, by applying Gaussian smoothing The position of key points are expressed relative to the length of the contour curve

35 CBD Extraction Repetitive smoothing of X and Y contour coordinates by the low-pass kernel (0.25, 0,5, 0,25) until the contour becomes convex Filtering pass ycss Location xCSS of curvature zero-crossing points F = {NofPeaks, GlobalCurv[ecc][circ], PrototypeCurv[ecc][circ], HighestPeakY, peakX[k], peakY[k]}

36 CBD Applicability Applicable to (a) Distinguishes differences in (b)
Find similarities in (c) - (e) Advantages: Captures the shape very well Robust to the noise, scale, and orientation It is fast and compact

37 Comparison (RB/CB descriptors)
Blue: Similar shapes by Region-Based Yellow: Similar shapes by Contour-Based

38 How MPEG-7 compare descriptors?
ANMRR (average modified retrieval rank): Traditional metric normalized measures that take into account different sizes of ground truth sets and the actual ranks obtained from the retrieval were defined -> retrievals that miss items are assigned a penalty.

39 Similarity between features
• Typically descriptors: multidimensional vectors (of low level features) • Similarity of two images in the vector feature space: – the range query: all the points within a hyperrectangle aligned with the coordinate axes – the nearest-neighbour or within-distance (α−cut) query: a particular metric in the feature space – dissimilarity between statistical distributions: the same metrics or specific measures

40 An example of CBIR system using HTD performing range query and NN query

41 Criticism on MPEG-7 distance measures
MPEG-7 adopts feature vector space distances based on geometric assumptions of descriptor space, e.g ..but these quantitative measures (low-level information) do not fit ideally with human similarity perception ->researchers from other areas have developed alternative predicate-based models (descriptors are assumed to contain just binary elements in opposition to continuous data) which express the existence of properties and express high level information See “Pattern difference” : K:NofPredicates in the data vectors Xi, Xj b: property exists in Xi c: property exists in Xj

42 Vector Space Distances

43 Distances/Similarity measures

44 How to build and deploy an MPEG-7 Description
A description A Description Scheme (structure) . in DDLanguage + A set of Descriptor Values (instantiation of a Descriptor for a given data set) MPEG-7 Description Tools are a library of standardized Descriptions and Description Schemes Adopting the XML Schema as the basis for the MPEG-7 DDL and the resulting XML-compliant instances (Descriptions in MPEG-7 textual format) eases interoperability by using a common, generic and powerful (+ extensible) representation format

45 Mpeg7 support for vectors, matrices and typed references
How that works Description Definition Language: >XML Schema (flexibility) - XMLS struct.lang.components - XMLS datatype lang.components - mpeg-7 spesific extentions + - >Binary version (efficiency) (XML) Text format BiM format mix

46 A DDL example (instantiation)
<complexType name=”VideoDoc"> <element name=“Title”…./> <element name=“Producer”…./> <element name=“Date”…./> </complexType> <complexType name=”NewsDoc" base=”VideoDoc" derivedBy="extension"> <element name=“Broadcaster”…. maxOccurs=“0”/> <element name=“Time”…. maxOccurs=“0”/> </type> <element name=”VideoCatalogue"> <complexType> <element name="CatalogueEntry" minOccurs="0" maxOccurs="*" type=”VideoDoc"/> </element> schema This permits VideoDoc elements, as well as types derived from VideoDoc to be used as a child of VideoCatalogue, e.g., NewsDoc <CatalogueEntry xsi:type=”NewsDoc"> <Title>”CNN 6 oclock News” </Title> <Producerr>David James</Author> <Date>1999</Date> <Broadcaster>CNN</Channel> </CatalogueEntry> instance

47 Descriptions enabled by the MPEG-7 tools
Content description Tools Additional info for organizing, managing and accessing the content: - How objs are related and gathered in collections summaries/variations/transcoding to support efficient browsing - User interaction info Archival-oriented Descriptions: content’s creation/production info on using the content info on storing and representing the content Perceptual Descriptions: - content’s spatio-temporal structure - info on low-level features - semantic info related to the reality captured by the content Organization/Naviga-tion/Access/ User Interaction Tools Content management Tools

48 Type hierarchy for top levels elements

49 <Mpeg7> <Description xsi:type=“ContentEntity”> <MultimediaContent xsi:type=“VideoType”> <Video id=“video_example”> <MediaInformation>...</MediaInformation> <TemporalDecomposition gap=“false” overlap=“false”> <VideoSegment id=“VS1”> <MediaTime> <MediaTimePoint> T00:00:00</MediaTimePoint> <MediaDuration>PT2M</MediaDuration> </MediaTime> <VisualDescriptor xsi:type=“GoFGoPColorType” aggregation=“average”> <ScalableColor numOfCoef=“8” numOfBitplanesDicarded=“0”> <Coeff> </Coeff> </ScalableColor> </VisualDescriptor> </VideoSegment> …… </VideoSegment> </TemporalDecompostion> </Video> </MultimediaContent> </Description> </Mpeg7>

50 MPEG-7 provides DSs for description of the structure and semantics of AV content + content management What DS to choose..? Cont.Manag. Info can be attached to individual Segments

51 Viewpoint of the structure: Segments

52 Structure description
Video Segment Time Mosaic Annotation Segment decomposition Video Segments Segment decomposition Time Color Motion Texture Shape Annotation Moving region Segments decomposition Moving regions Relation Link above

53 Segment Decomposition
time connectivity

54 Content structural aspects (Segment DS tree)
Annotate the whole image with StillRegion Spatial segmentation at different levels Among different regions we could use SegmentRelationship description tools

55 Content structural aspects
(Segment Relationship DS graph) Temporal segments

56 Viewpoint of conceptual notions

57 Content Semantic aspects (SemanticGraph)

58 Example of Structure-Semantic Link DS

59 Content abstraction aspects (CoAbstr)-Hierarchical summary of a video
- > enables rapid browsing, navigation (also sequential summary)

60 (CoAbstr)-Partitions and decompositions (ViewDecomposition DS)
Frequency-space graph

61 (CoAbstr) Content Variation
Universal Multimedia Access: Adapt delivery to network and terminal characteristics

62 CoAbstr – A collection (Collection StructureDS)
- >groups segments, events, or objects into collection clusters and specifies properties that are common to the elements: The CollectionStructure DS describes also statistics and models of the attribute values of the elements, such as a mean color histogram for a collection of images. The CollectionStructure DS also describes relationships among collection clusters.

63 Reference Software: the XM
XM implements MPEG-7 Descriptors (Ds) MPEG-7 Description Schemes (DSs) Coding Schemes DDL <--search and retrieval extraction <--trasnscoding description filtering 

64 Beyond mpeg-7 version 1 (D&DS in VXM)
ColorTemperature: This descriptor specifies the perceptual temperature feeling of illumination color in an image for browsing and display preference control purposes (user friendly). Four perceptual temperature browsing categories are provided; hot, warm, moderate, and cool. Each category is used for browsing images based upon its perceptual meaning. – uses dominant color descriptor Illumination Invariant Color: wraps the color descriptors. One or more color descriptors processed by the illumination invariant method can be included in this descriptor. Shape Variation: can describe shape variations in terms of Shape Variation Map and the statistics of the region shape description of each binary shape image in the collection. Shape Variation Map consists of StaticShapeVariation and DynamicShapeVariation. The former corresponds to 35 quantized ART coefficients on a 2-dimensional histogram of group of shape images and the latter to the inverse of the histogram except the background. Technologies for digital photo management using MPEG-7 visual tools This amendment includes informative use scenarios of MPEG-7 visual descriptions, enabling advanced content-based image retrieval and categorization technologies of digital photo contents. These functionalities are especially useful for the developer of MPEG-A Photo Player to equip full-automatic album creation capability on their compliant applications. Media-centric description schemes: Three visual description schemes are designed to describe several types of visual contents. The StillRegionFeatureType contains several elementary descriptors to describe the characteristics of arbitrary shaped still regions.

65 Visual CE current phase
CE explore new technologies on identifying original images and their modified versions (N-1 modified versions), focused on the accuracy and robustness of identification > robustness is measured as the accuracy (HitRatio = k/(N)) separately calculated with each level of modification Modifications: Brightness Size reduction Color to Monochrome JPEG compr. with varying quality factors Color reduction Crop Histogram Equalization Blur Geometric Transformation

66 Towards MPEG-7 Query Format
-query by textual description -Combinations of query conditions -spesification of the structure of the result set Towards MPEG-7 Query Format e.g. structure of the response containing the resulting set >Though, the interface to support queries in an MPEG-7 database is not yet supported, requirements have been drafted Output Query Format Client Application MPEG-7 Database Input Query Format Query Management Tools e.g -spesification of the exceptions -relevant feedback

67 Basic search functionalities may include:
Query by Description (the client application provides possible query criteria)

68 Query Query by example Segment-based search (selecting subparts or ROI to refine the search criteria) =>

69 Compositional search :from a globalization page the user may select a number of interesting (or relevant) images to refine the search criteria => +

70 Current state of MPEG-7 VXM in CBIR
Query by modified sketch [segmentation/simplification by assigning a repres. color in each segment/ modification] Query within ROI Situation-based clustering (Simple clustering/ Clustering on Visual semantic Hints) Category-based clustering (local-concept lexicon: multiple low-level features of locasl regions used in learning and detecting local concepts)

71 Query within ROI uses EHD and CLD for describing local properties
- >example: photos search by matching the background regions only

72 Situation based clustering based on visual semantic hints (visual sensation-vs)
Colorfulness (CoF) hint: degree of v.s. according to the purity of colors >Utilizes ScalableColorDescriptor

73 Situation based clustering based on visual semantic hints (visual sensation-vs) (2)
Color Coherence (CoC) Hint: degree of v.s. according to spatial coherency of colors > utilizes DominantColorDescriptor

74 Situation based clustering based on visual semantic hints (visual sensation-vs) (3)
Level of Detail (LoD) Hint: degree of a v.s for objects appearing more or less detailed > defines a relative compression ratio per pixel based on the JPEG compression the photo has gone through =

75 Situation based clustering based on visual semantic hints (visual sensation-vs) (4)
Homogeneous Texture (HoT) Hint: degree of a v.s according homogeneous texture on photo -> expresses texture regularity using TextureBrowsingDescriptor Heterogeneous Texture (HeT) Hint: degree of a v.s. on how continoous or strong the boundaries are on photo -> utilizes EdgeHistogramDescriptor

76 Category-based clustering
local-concept lexicon: multiple low-level features of local regions used in learning and detecting local concepts, once the local concepts have been built , confidence values for each sub-region are measured for all local concepts

77 MPEG relative activities
Functionalities described before are especially useful for the developer of MPEG-A Photo Player:  offers a standardized solution for the carriage of images and associated metadata, to facilitate simple and fully interoperable exchange across different devices and platforms. - >The set of metadata includes MPEG-7 visual content descriptions, as well as acquisition-based metadata (such as date, time and camera settings). This allows compliant devices to support new, content-enhanced functionality, such as intelligent browsing, content-based search or automatic categorization

78 Summary MPEG-7 Standard
- MPEG-7 visual and content structure description tools (Ds & DSs using DDL) MPEG-7 requirements on Queries Format MPEG-7 VXM current phase (descriptors and CBIR)  Multimedia segmentation, understanding, and searching, among others, are still a challenge

79 The end. Most of the pictures or their basic ideas are taken from the listed papers and web pages.


Download ppt "Multimedia Content Description Interface"

Similar presentations


Ads by Google