Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Constantine Kotropoulos Monday July 8, 2002 Visual Information Retrieval Aristotle University of Thessaloniki Department of Informatics.

Similar presentations


Presentation on theme: "1 Constantine Kotropoulos Monday July 8, 2002 Visual Information Retrieval Aristotle University of Thessaloniki Department of Informatics."— Presentation transcript:

1 1 Constantine Kotropoulos Monday July 8, 2002 Visual Information Retrieval Aristotle University of Thessaloniki Department of Informatics

2 2 4 4 Fundamentals 4 4 Still image segmentation: Comparison of ICM and LVQ techniques 4 4 Shape retrieval based on Hausdorff distance 4 4 Video Summarization: Detecting shots, cuts, and fades in video – Selection of key frames 4 4 MPEG-7: Standard for Multimedia Applications 4 4 Conclusions Outline

3 3 4 4 About 4 4 Toward visual information retrieval 4 4 Data types associated with images or video 4 4 First generation systems 4 4 Second generation systems 4 4 Content-based interactivity 4 4 Representation of visual content 4 4 Similarity models 4 4 Indexing methods 4 4 Performance evaluation Fundamentals

4 4 4 Visual information retrieval: –To retrieve images or image sequences from a database that are relevant to a query. –Extension of traditional information retrieval designed to include visual media. 4 Needs: Tools and interaction paradigms that permit searching for visual data by referring directly to its content. –Visual elements (color, texture, shape, spatial relationships) related to perceptual aspects of image content. –Higher-level concepts: clues for retrieving images with similar content from a database. 4 Multidisciplinary field: –Information retrieval  Image/video analysis and processing –Visual data modeling and representation  Pattern recognition –Multimedia database organization  Computer vision –Multimedia database organization  User behavior modeling –Multidimensional indexing  Human-computer interaction About

5 5 4 Databases –allow a large amount of alphanumeric data to be stored in a local repository and accessed by content through appropriate query languages. 4 Information Retrieval Systems –provide access to unstructured text documents Search engines working in the textual domain either using keywords or full text. 4 Need for Visual Information Retrieval Systems has become apparent when –digital archives were released. –distribution of image and video data though large-bandwidth computer networks emerged –more prominent as we progress to the wireless era! Toward visual information retrieval

6 6 Query by image content using NOKIA 9210 Communicator www.iva.cs.tut.fi/COST211 Iftikhar et al.

7 7 4 Content-dependent metadata –Data related in some way to image/video content (e.g., format, author’s name, date, etc.) 4 Content-dependent metadata –Low/intermediate-level features: color, texture, shape, spatial relationship, motion, etc. –Data referring to content semantics (content-descriptive metadata) 4 Impact on the internal organization of the retrieval system Data types associated with images or video

8 8 4 Answers to queries: Find –All images of paintings of El Greco. –All byzantine ikons dated from 13 th century, etc. 4 Content-independent metadata: alphanumeric strings 4 Representation schemes: relational models, frame models, object-oriented. 4 Content-dependent metadata: annotated keywords or scripts 4 Retrieval: Search engines working in the textual domain (SQL, full text retrieval) 4 Examples: PICDMS (1984), PICQUERY (1988), etc. 4 Drawbacks: –Difficult for text to capture the distinctive properties of visual features –Text not appropriate for modeling perceptual similarity –Subjective First generation systems

9 9 4 Supports full retrieval by visual content –Conceptual level: keywords –Perceptual level: objective measurements at pixel level –Other sensory data (speech, sound) might help (e.g. video streams). 4 Image processing, pattern recognition and computer vision are an integral part of architecture and operation 4 Retrieval systems for –2-D still images –Video –3-D images and video –WWW Second generation systems

10 10 4 Content –Perceptual properties: color, texture, shape, and spatial relationships –Semantic primitives: objects, roles, and scenes –Impressions, emotions, and meaning associated with the combination of perceptual features 4 Basic retrieval paradigm: For each image a set of descriptive features are pre- computed 4 Queries by visual examples –The user selects the features, ranges of model parameters, and chooses a similarity measure –The system checks the similarity between the visual content of the user’s query and database images. 4 Objective: To keep the number of misses as low as possible. Number of false alarms? 4 Interaction: Relevance feedback Retrieval systems for 2-D still images (1)

11 11 Similarity vs. matching 4 Matching is a binary partition operator: “Does the observed object correspond to a model or not?” Uncertainties are managed during the process 4 Similarity-based retrieval: To re-order the database of images according to how similar are to a query example. Ranking not classification The user is in the retrieval loop; Need for a flexible interface. Retrieval systems for 2-D still images (2)

12 12 4 Video conveys information from multiple planes of communication –How the frames are linked together using editing effects (cuts, fades, dissolves, etc). –What is in the frames (characters,story content, etc.) 4 Each type of video (commercials, news, movies, sport) has its own peculiar characteristics. 4 Basic Terminology –Frame: basic unit of information usually samples at 1/25 or 1/30 of a second. –Shot: A set of frames between a camera turn-on and a camera turn-off –Clip: A set of frames with some semantic content –Episodes: An hierarchy of shots; –Scene: A collection of consecutive shots that share simultaneity is space, time, and action (e.g. a dialog scene). 4 Video is accessed through browsing and navigation Retrieval systems for video (1)

13 13 Retrieval systems for video (2)

14 14 4 3-D images and video are available in –biomedicine –computer-aided design –Geographic maps –Painting –Games and entertainment industry (immersive environments) 4 Expected to flourish in the current decade 4 Retrieval on the WWW: –Distributed problem –Need for standardization (MPEG-7) –Response time is critical (work in the compressed domain, summarization) Retrieval systems for 3-D images and video / WWW

15 15 1. Visual interfaces 2. Standards for content representation 3. Database models 4. Tools for automatics extraction of features from images and video 5. Tools for extraction of semantics 6. Similarity models 7. Effective indexing 8. Web search and retrieval 9. Role of 3-D Research directions

16 16 4 Browsing offers a panoramic view of the visual information space 4 Visualization Content-based interactivity www.virage.com

17 17 QBIC http://wwwqbic.almaden.ibm.com/ color layout

18 18 For still images: 4 4 To check if the concepts expressed in a query match the concepts of database images: “find all Holy Ikons with a nativity” “find all Holy Ikons with Saint George” (object categories) Treated with free-text or SQL-based retrieval engines (Google) 4 4 To verify spatial relations between spatial entities “find all images with a car parked outside a house”   topological queries (disjunction, adjacency, containment, overlapping)   metric queries (distances, directions, angles) Treated with SQL-like spatial query languages Querying by content (1)

19 19 4 4 To check the similarity of perceptual features (color, texture, edges, corners, and shapes)   exact queries: “find all images of President Bush”   range queries: “find all images with colors between green and blue”   K-nearest neighbor queries: find the ten most similar images to the example” For video: 4 4 Concepts related to video content 4 4 Motion, objects, texture, and color features of video: Shot extraction, dominant colors, etc. Querying by content (2)

20 20 Google

21 21 Ark of Refugee Heirloom www.ceti.gr/kivotos

22 22 4 Suited to express perceptual aspects of low/intermediate features of visual content. 4 The user provides a prototype image as a reference example 4 Relevance feedback: the user analyses the responses of the system and indicates, for each item retrieved the degree of relevance or the exactness of the ranking; the annotated results are fed back into the system to refine the query. 4 Types of querying: –Iconic (PN) : Suitable for retrieval based on high-level concepts –By painting Employed in color-based retrieval (NETRA) –By sketch (PICASSO) –By image (NETRA) Querying by visual example

23 23 PICASSO/PN http://viplab.dsi.unifi.it/PN/

24 24 NETRA http://maya.ece.ucsb.edu/Netra/netra.html

25 25 4 4 Representation of perceptual features of images and video is a fundamental problem in visual information retrieval. 4 4 Image analysis and pattern recognition algorithms provide the means to extract numeric descriptors. 4 4 Computer vision enables object and motion identification 4 4 Representation of perceptual features   Color   Texture   Shape   Structure   Spatial relationships   Motion 4 4 Representation of content semantics   Semantic primitives   Semiotics Representation of visual content

26 26 Representation of perceptual features Color (1)

27 27 4 4 Human visual system: Responsible for color perception are the cones. 4 4 From psychological point of view, perception of color is related to several factors e.g.,   color attributes (brightness, chromaticity, saturation)   surrounding colors   color spatial organization   observer’s memory/knowledge/experience 4 4 Geometric color models (RGB, HSV, Lab, etc.) 4 4 Color histogram: to describe the low-level color properties. Representation of perceptual features Color (2)

28 28 Image retrieval by color similarity (1)   Color spaces 4 4 Histograms; 4 4 Moments of distribution 4 4 Quantization of the color space 4 4 Similarity measures   L 1 and L 2 norm of the difference between the query histogram H(I Q ) and the histogram of a database image H(I D )

29 29 Image retrieval by color similarity (2)   histogram intersection   weighted Euclidean distance

30 30 4 4 Texture: One level of abstraction above pixels. 4 4 Perceptual texture dimensions:   Uniformity   Density   Coarseness   Roughness   Regularity   Linearity   Directionality/Direction   Frequency   Phase Representation of perceptual features Texture (1) Brodatz album

31 31 4 4 Statistical methods:   Autocorrelation function (coarseness, periodicity)   Frequency content [rings, wedges] Coarseness, Directionality, isotropic/non-isotropic patterns   Moments   Directional histograms and related features   Run-lengths and related features   Co-occurrence matrices 4 4 Structural methods (Grammars and production rules) Representation of perceptual features Texture (2)

32 32 4 4 Criteria of a good shape representation   Each shape possesses a unique representation invariant to translation, rotation, and scaling.   Similar shapes should have similar representations 4 4 Methods to extract shapes and to derive features stem from image processing   Chain codes   Polygonal approximations   Skeletons   Boundary descriptors   contour length/ diameter   shape numbers   Fourier descriptors   Moments Representation of perceptual features Shape (1)

33 33 Representation of perceptual features Shape (2) Chain codes Polygonal approximation (I. Pitas)

34 34 Representation of perceptual features Shape (3) abcd Face segmentation: (a) original color image (b) skin segmentation. (c ) connected components (d) best fit-ellipses.

35 35 4 4 Structure   To provide a Gestalt impression of the shapes in the image.   set of edges   corners   To distinguish photographs from drawings.   To classify scenes: portrait, landscape, indoor 4 4 Spatial relationships   Spatial entities: points, lines, regions, and objects   Relationships:   Directional (include a distance/angle measure)   Topological (do not include distance but they capture set-theoretical concepts e.g. disjunction)   They are represented symbolically. Representation of perceptual features Structure/Spatial relationships

36 36 4 4 Main characterizing element in a sequence of frames 4 4 Related to change in the relative position of spatial entities or toa a camera movement. 4 4 Methods:   Detection of temporal changes of gray-level primitives (optical flow)   Extraction of a set of sparse characteristic features of the objects, such as corners or salient points and their tracking in subsequent frames. 4 4 Crucial role in video Representation of perceptual features Motion Salient features (Kanade et al.)

37 37 4 4 Identification of objects, roles, actions and events as abstractions of visual signs. 4 4 Achieved through recognition and interpretation 4 4 Recognition   To select a set of low-level local features and statistical pattern recognition for object classification   Interpretation is based on reasoning. 4 4 Domain-dependent e.g. Photobook (www-white.media.mit.edu) 4 4 Retrieval systems including interpretation: facial database systems to compare facial expressions Representation of content semantics Semantic primitives

38 38 4 4 Grammar of color usage to formalize effects 4 4 Association of color hue, saturation, etc to psychological behaviors 4 4 Semiotics identifies two distinct steps for the production of meaning   Abstract level by narrative structures (e.g. camera breaks, colors, editing effects, rhythm, shot angle)   Concrete level by discourse structures: how the narrative elements create a story. Representation of content semantics Semiotics

39 39 4 4 Pre-attentive: perceived similarity between stimuli   Color/texture/shape;   Models close to human perception 4 4 Attentive:   Interpretation   Previous knowledge and a form of reasoning   Domain-specific retrieval applications (mugshots); need for models and similarity criteria definition Similarity models

40 40 4 4 Distance in a metric psychological 4 4 Properties of a distance function d: 4 4 Commonly used distance functions:   Euclidean   City-block   Minkowsky Metric model (1)

41 41 4 4 Inadequacies: shape similarity 4 4 Advantages:   similarity judgment of color stimuli   consistent with pattern recognition and computer vision   suitable for creating indices 4 4 Other similarity models: 4 4 Virtual metric spaces 4 4 Tversky’s model: function of two types of features: those that are common to the two stimuli and those that exclusively appear to one only stimulus. 4 4 Transformational distances: elastic graph matching 4 4 User subjectivity? Metric model (2)

42 42 4 4 Self improving database browser and annotator based on user interaction 4 4 Similarity is presented with groupings 4 4 The system chooses in trees hierarchies those nodes which most efficiently represent the positive examples. 4 4 Set-covering algorithm to remove all positive examples covered. 4 4 Iterations Four eyes approach

43 43 4 4 To avoid sequential scanning 4 4 Retrieved images are ranked in order of similarity to a query 4 4 Compound measure of similarity between visual features and text attributes. 4 4 Indexing of string attributes 4 4 Commonly used indexing techniques   Hashing tables and signatures   Cosine similarity function Indexing methods (1)

44 44 4 4 Triangle inequality (Barros et al.) 4 4 When the query item q is presented, then d(q,r) is computed. 4 4 For all database items i: 4 4 Maximum threshold l=d(q,r); r the most similar item 4 4 Search for distances closest to d(q,r) 4 4 If d(i,r) inferior to d(q,r) is found, item i is regarded as the most similar item, and l=d(i,r). 4 4 Continue until | d(i,r)-d(q,r)|  l Indexing methods (2)

45 45 4 4 Fixed grids: non-hierarchical index structure that organizes the space into buckets. 4 4 Grid files: fixed grids with buckets of unequal size 4 4 K-d trees: Binary tree; the values of one of the k features is checked at each node. 4 4 R-trees: partition the feature space into multidimensional rectangles 4 4 SS-trees: Weighted Euclidean distance; suitable for clustering; ellipsoidal clusters Index structures

46 46 Performance evaluation Judgment by evaluator RelevantNot relevant RetrievedA (correctly retrieved) C (falsely retrieved) Not RetrievedB (missed) D (correctly rejected)

47 47 Wrap-up 4 4 Visual information retrieval is a research topic at the intersection of digital image processing, pattern recognition, and computer vision (fields of our interest/expertise) but also information retrieval, databases. 4 4 Related to semantic web 4 4 Challenging research topic dealing with many unsolved problems: 4 4 segmentation 4 4 machine similarity vs. human perception 4 4 focused searching

48 48 Still Image Segmentation: Comparison of ICM and LVQ Comparison –Iterated Conditional Modes (ICM) –Split and Merge Learning Vector Quantizer (LVQ) Ability to extract meaningful image parts based on the ground truth Evaluation of still image segmentation algorithms

49 49 Iterated Conditional Modes (ICM) The ICM method is based on the maximization of the probability density function of the image model given real image data. The criterion function is: where xs is the region assignment and ys is the luminance value of the pixel s mi and δi are mean value and the standard deviation of luminance of the region i; C is the clique of the pixel s, VC(x) is the potential function of C, N8(s) is 8x8 neighborhood of the pixel s.

50 50 How ICM works Initial segmentation is obtained using the K-means clustering algorithm. Cluster center initialization is based on image intensity histogram. At each iteration probability, the value of the criterion function, is calculated for each pixel. Pixels are assigned to clusters- regions with maximum probability. Having a new segmentation, the mean intensity value and the cluster variance are estimated. The iterative process stops when no change occurs in clusters. For obtained segmentation, small regions are merged with nearest ones. The output image contains the large regions assigned the mean luminance value.

51 51 Image features and parameters of the ICM algorithm The ICM algorithm is applied on the luminance component of the image. Input for the algorithm is a gray level image. The parameter of the algorithm is the value of the potential function. The parameter controls the roughness of the segment boundaries. The value of the parameter is tuned experimentally.

52 52 Segmentation results (ICM )

53 53 Learning Vector Quantizer (1) neural network self organizing competitive learning law unsupervised approximates data pdf by adjusting the weights of the reference vectors

54 54 Learning Vector Quantizer (2) codebook reference vectors representing their nearest data patterns number of reference vectors –predefined –split and merge

55 55 Learning Vector Quantizer (3) Minimal error for data representation: Iterative correction of reference vectors:

56 56 Learning Vector Quantizer (4) Split and merge technique –Find the winner reference vector w(k) for pattern x(k). –if x(k) is not an outlier proceed as in standard LVQ. –if x(k) is an outlier: split the cluster and include x(k) in one of the sub-clusters. or create a new cluster having seed x(k).

57 57 Learning Vector Quantizer (5)

58 58 Experimental set-up (1) 4 Apply both methods on images provided by BAL 4 Explore the ability of the algorithms to extract meaningful image parts based on the qualitative description of the ground truth.

59 59 Paintings from Bridgeman Art Library –sky, mountains, people, water (smpw) –hammerhead cloud, reflection (cr) –sky, buildings, trees, people, pavement (sbtpp) –sky, people, hat (sph) –sky, trees, water, sails (stws) –horses, sledges, people, snow, sky (hspss)

60 60 Experimental setup (2) 4 We define by O={O1,..,OM} the set of objects given in the qualitative description of the ground truth, where M is the number of objects. 4 We define by T={T1,..,TN} the set of the regions with the unique label, obtained in the segmented image, where N is number of regions. 4 Three cases on the outcome of the segmentation as compared to the ground truth are possible.

61 61 Matching Case 1, best match (BM): The best match is when the region of the segmented image has one to one correspondence with the ground truth object; Case 2, reasonable match (RM): The reasonable match is when the ground truth object has one to many correspondence with the regions of the segmented image; Case 3, mismatch. The mismatch is when there is no correspondence between the ground truth objects and the regions of the segmented image.

62 62 Three Cases For the jth ground truth object Oj by denoting the cases by i, and the segmented region by T the three cases occur as follows:

63 63 Decision The decision about the presence of the ground truth object Oj in the segmented image according to all cases is: We put a decision for each object after visual examination of the segmented image according to the definition of the ground truth.

64 64 Assessment of results (1) Ground truth sky buildings trees people pavement

65 65 Assessment of results (2) LVQ

66 66 Assessment of results (3) ICM

67 67 Assessment of results (4) Ground truth horses sledges people snow sky

68 68 Assessment of results (5) LVQ

69 69 Assessment of results (6) ICM

70 70 Assessment of results (7) Number of regions

71 71 Assessment of results (8) Ranking: ICM vs. LVQ BM: Best Match RM: Reasonable Match MM: Mismatch

72 72 Assessment of results (9) Ranking: ICM vs. LVQ BM: Best Match RM: Reasonable Match MM: Mismatch

73 73 Evaluation of Image Segmentation Algorithms (1) Cián Shaffrey, Univ. of Cambridge

74 74 Evaluation of Image Segmentation Algorithms (2)   Evaluation within the Semantic Space; Impossible to ask the Average User to provide all possible h 4 4 Compromise: Evaluation in the Indexing Space;Allows us to access S without explicitly defining σ. 4 4 Average User: to achieve a consensus on h. 4 4 Ask users to evaluate two proposed arrows π to obtain Average User’s response. Implicitly characterize h and σ.

75 75 Evaluation of Image Segmentation Algorithms (3) 4 4 Unsupervised algorithms 1. 1. Multiscale Image Segmentation (UCAM-MIS) 2. 2. Blobworld (UC Berkeley-Blobworld) 3. 3. Iterated Conditional Modes (AUTH-ICM) 4. 4. Learning Vector Quantizer (AUTH-LVQ) 5. 5. Double Markov Random Field (TCD-DMRF) 6. 6. Complex Wavelet based Hidden Markov Tree (UCAM- CHMT)

76 76 Evaluation of Image Segmentation Algorithms (4) 4 4 Hard measurements 4 4 Soft measurements: The speed of response of the user (time -1 ): how much better the user prefers one scheme over the other   Faster response: the selected scheme provides a better semantic breakdown of the original image   Slower response: reflects the similarity of two schemes Aims:   To determine whether or not agreement exists in users’ decisions   Do two pairwise rankings lead to consistent total orderings?   Do hard and soft measurements coincide?

77 77 Evaluation of Image Segmentation Algorithms (5) Cián Shaffrey, Univ. of Cambridge

78 78 Evaluation of Image Segmentation Algorithms (6)

79 79 Wrap-up ICM –continuous, large sized regions –appropriate for homogeneous regions LVQ –spatially connected, small regions –more detailed segmentation Both provide good RM

80 80 Image retrieval based on Hausdorff distance 4 4 Hausdorff distance definition 4 4 Advantages 4 4 How to speed-up the computations 4 4 Experiments

81 81 Hausdorff distance definition d H+ (A,B) = sup {d(x,B) : x  A} d H- (A,B) = sup {d(y,A) : y  B}, d(v,W) = inf {d(v,w) : w  W}. d H (A,B) = max (d H+ (A,B), d H- (A,B))

82 82 Hausdorff distance advantages 4 d H (A, B) = 0  A=B (A, B – sets representing graphical objects, object contours, etc.) 4 Information about parameters of transformation (complex object recognition) 4 Predictable – simple intuitive interpretation 4 d H + and d H - - for partial obscured or erroneously segmented objects 4 Possibility of generalization: max  quantiles 4 Possibility of taking into consideration any object transformations

83 83 How to speed up the computations for comparing one pair (1) A. Replacing objects by their contours The HD between the objects may be large although for contours the HD is small (e.g. disk and ring) possibility of false alarms but Contours of similar objects are always similar (small HD) no possibility of omitting similar objects

84 84 How to speed up the computations for comparing one pair(2) How to speed up the computations for comparing one pair (2) B. B.Voronoi diagram or distance transform C. C.Early scan termination D. D.Pruning some parts of transformation space

85 85 How to speed up the computations – Number of models consider Idea: Matrix of distances for models (every pair)  1. Pruning some models (we know they will not match query) 2. Database navigation optimal search order (possibility of early finish)

86 86 How to speed up the computations A. Excluding of model object from the search query   ref – any model object   - distance to the closest model only here may lay model closest to query object ref Model closest to query object may lay only in colored area

87 87 How to speed up the computations B. Pruning with many reference objects

88 88 How to speed up the computations C. Optimal searching order

89 89 How to speed up the computations D. Introducing other criteria (pre-computation) Moment invariants: M 1 =(M 20 +M 02 ) / m 00 2 M 1 =(M 20 +M 02 ) / m 00 2 M 2 =(M 20  M 02 – M 11 2 ) / m 00 4 M 2 =(M 20  M 02 – M 11 2 ) / m 00 4where: Shape coefficients: Blair-Bliss coefficient Blair-Bliss coefficient

90 90 Experiments - database Database: 76 islands, represented as *.bmp images..…

91 91 Experiment 1: map query Image retrieval. Step 1: interactive segmentation of query object

92 92 Experiment 1: map query Searching order: 8 / 76 model object were checked Loading model 1 / 76: "amorgos.bmp" Hausdorff distance: 0.156709 Loading model 42 / 76: "ithaca.bmp" Hausdorff distance: 0.143915 Loading model 27 / 76: "ikaria.bmp" Hausdorff distance: 0.080666 Loading model 31 / 76: "kasos.bmp" Hausdorff distance: 0.080551 Loading model 20 / 76: "sikinos.bmp" Hausdorff distance: 0.121180 Loading model 52 / 76: "alonissos.bmp" Hausdorff distance: 0.153914 Loading model 17 / 76: "rithnos.bmp" Hausdorff distance: 0.103512 Loading model 61 / 76: "skopelos.bmp" Hausdorff distance: 0.045430

93 93 Experiment 1: map query Minimum of Hausdorff distance of model closest to query object

94 94 Experiment 2: mouse-drawing query Query HD criterionposition for min HD HD+M 1 +M 2 +W BB SantoriniHD = 0.112 MCD=1.024 HD = 0.143 MCD=1.771 max HD = 0. 3072 max MCD=3.4326 closest second furthest Poros Elafonisos

95 95 Wrap-up Hausdorff distance is better for shape recognition than feature-based criteria.Big computational cost of image retrieval based on HD can be reduced by: decreasing cost of computation for pair of objects replacing object by it’s contours using of Voronoi diagram off-line database processing – calculating of matrix of distances between model objects reducing number of model objects to be compared optimal searching order using features as auxiliary similarity criteria

96 96 Video Summarization: Detecting shots, cuts, and fades in video – Selection of key frames

97 97 Outline 4 Entropy, joint entropy, and mutual information 4 Shot cut detection based on mutual information 4 Fade detection based on joint entropy 4 Key frame selection 4 Comparison with other methods 4 Wrap-up

98 98 Entropy-Joint Entropy measure of the information content or the “uncertainty” about X. Joint entropy of RVs X and Y: Entropy of a random variable X (RV):

99 99 Mutual Information It measures the average reduction in uncertainty about X that results from learning the value of Y. It measures the amount of information that X conveys about Y.

100 100 - -for each pair of successive frames f t and f t+1 whose gray levels vary from 0 to N-1 Calculate three NxN co-occurrence matrices, one for each chromatic component R, G, and B, whose (i,j) element is the joint probability of observing a pixel having the ith gray level in f t and jth gray level in f t+1 calculate the mutual information of the gray levels for the three components R, G, B independently and sum them. Algorithm for detecting abrupt cuts (1)

101 101 – –Apply a robust estimator of the mean value in the time-series of mutual information values by defining a time-window around each time instant t 0 - -An abrupt cut is detected if Algorithm for detecting abrupt cuts (2)

102 102 cuts Mutual information pattern from “star” video sequence that depicts cuts Mutual information pattern (1)

103 103 Ground truth

104 104 Performance evaluation GT: denotes the ground truth, Seg:the segmented (correct and false) shots using our methods Recall is corresponding to the probability of detection Precision is corresponding to the accuracy of the method considering false detections Overlap (for fades)

105 105 Test results (1)

106 106 Test results (2)

107 107 – –Features that could be used to define a distance measure: Successive color frame differences: Successive color vector bin-wise HS histogram differences (invariant to brightness changes): – –Fusion of the two differences: – –Shot cut detection by adaptive local thresholding Alternative technique for shot cut detection

108 108 results using mutual information results using the combined method Comparison of abrupt cut detection methods

109 109 If G(x,y,t) is a gray scale sequence then, the chromatic scaling of G(x,y,t) can be modeled as Therefore, a fade-out can be modeled as: and a fade-in as: Fades (1)

110 110 part of video sequence showing fade-in part of video sequence showing fade-out Fades (2)

111 111 cutsfade Mutual information pattern from “basketball” video sequence showing cuts and fade Mutual information pattern (2)

112 112 For each pair of successive frames f t and f t+1 calculate the joint entropy of the basic chromatic components. Determine the values of the joint entropy close to zero Detect fade-out (fade-in) The first (last) zero value defines the end (start) of fade-out (fade-in) Find the start (end) of fade-out (fade-in). A fade should have at least a duration of 2 frames: Algorithms for detecting fades (1)

113 113 Fade outcut frame 1785frames 1791-1802 frame 1803 frame 1805 frame 1765 frame 1770 frame 1775 frame 1780 Joint entropy pattern (1)

114 114 threshold fade frame 4420 frame 4425 frame 4426 frame 4430frame 4440 Cut to the dark frame Joint entropy pattern (2)

115 115 results using the joint entropy results using the average frame value Comparison of fade detection methods (1)

116 116 results using the joint entropy results using the average frame value Comparison of fade detection methods (2)

117 117 split & merge algorithm based on the series of mutual information of gray levels at successive frames within the shot choose clusters of large sizes select as potential key frame the first frame from each cluster. test the similarity of potential key-frames using the mutual information Algorithm for key frame selection (1)

118 118 Key frame selection (1)

119 119 star sequence Key frame selection (2)

120 120 frame 1690frame 1770 Key frame selection (3)

121 121 frame 314 frame 2026 frame 2904 frame 4344 key frames selected from different shots two key frames selected from one shot frame 2607frame 2637 Key frame selection (4)

122 122 Wrap-up 4 New methods for detecting cuts and fades with high precision have been described. 4 Accurate detection of fade borders (starting and ending point) has been achieved. 4 Comparisons with other methods demonstrate the accuracy/success of the proposed techniques. 4 Satisfactory results for key frame selection by performing clustering on the mutual information series have been reported.

123 123 4 4 Introduction 4 4 Applications 4 4 Standard 4 4 Description elements 4 4 Visual structural elements 4 4 Description schemes 4 4 for still images 4 4 video 4 4 Wrap-up MPEG-7: Standard for Multimedia Information Systems

124 124 4 MPEG-7: annotates –data in MPEG-4 object-based representations (interactive representations) MPEG-2 MPEG-1 –analog data (e.g. VHS) –photo prints –artistic pictures 4 It is not about compression. 4 Aim: Description of audiovisual content –Descriptors –Description Schemes –Description Definition Introduction Frame-based encoding of waveforms

125 125  Provides generic description of audiovisual and multimedia content for –systematic access to audiovisual information sources –re-usability of descriptions and annotations –management and linking of content, events, and user interaction (Jens-Rainer Ohm, HHI) Applications

126 126 o MPEG-7 consists of  Descriptors (D) with Descriptor Value (DV)  Description Schemes (DS)  Description Definition Language (DDL) (Jens-Rainer Ohm, HHI) Standard

127 127  Structural (Can be extracted automatically)  Signal-based features  Regions and Segments  Semantic/Conceptual (Mostly manual annotation) oObjects oScenes oEvents  Metadata (Manual or non-signal based annotation) oAcquisition & productions oHigh-level content description oIntellectual property, usage Description elements

128 128  Examples of low-level visual features  Color  Texture  Shape  Motion  Examples of MPEG-7 visual descriptors Visual structural elements ColorColor histogram, Dominant color TextureFrequency layout, Edge histogram ShapeZernike moments, curvature peaks MotionMotion trajectory, parametric motion   Examples of MPEG-7 Visual Description Schemes   Still region   Moving region   Video Segment

129 129  Layouts for description schemes  Hierarchical (tree)  Relational (entity relationship graph) Description Schemes

130 130 Still Region Description Scheme

131 131 Video Sequence Description Scheme

132 132 Description Definition Language   Based on Extensible Markup Languages

133 133 4 4 MPEG-7: Generic description interface for audiovisual and multimedia content 4 4 MPEG-7: Can be used for   Search/filtering and manipulation of audiovisual information   Multimedia browsing and navigation   Data organization, archiving, and authoring   Interpretation and understanding of multimedia content 4 4 Key technology Wrap-up

134 134 4 4 Overview of fundamentals for information retrieval 4 4 Focus on segmentation and its assessment 4 4 Shape retrieval based on Hausdorff distance 4 4 Video Summarization Acknowledgments: I. Pitas, E. Pranckeviciene, Z. Chernekova, C. Nikou, and P. Rotter. Conclusions


Download ppt "1 Constantine Kotropoulos Monday July 8, 2002 Visual Information Retrieval Aristotle University of Thessaloniki Department of Informatics."

Similar presentations


Ads by Google