Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture,

Slides:

Advertisements

Similar presentations

Visual Event Recognition in Videos by Learning from Web Data Lixin Duan, Dong Xu, Ivor Tsang, Jiebo Luo ¶ Nanyang Technological University, Singapore ¶

Advertisements

Aggregating local image descriptors into compact codes

DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.

Detecting Categories in News Video Using Image Features Slav Petrov, Arlo Faria, Pascal Michaillat, Alex Berg, Andreas Stolcke, Dan Klein, Jitendra Malik.

Computer Vision – Image Representation (Histograms)

Yuanlu Xu Human Re-identification: A Survey.

Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.

Discriminative and generative methods for bags of features

Computer Vision Group, University of BonnVision Laboratory, Stanford University Abstract This paper empirically compares nine image dissimilarity measures.

One-Shot Multi-Set Non-rigid Feature-Spatial Matching

Bag of Features Approach: recent work, using geometric information.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Image segmentation. The goals of segmentation Group together similar-looking pixels for efficiency of further processing “Bottom-up” process Unsupervised.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Segmentation Divide the image into segments. Each segment:

Video Search Engines and Content-Based Retrieval Steven C.H. Hoi CUHK, CSE 18-Sept, 2006.

A String Matching Approach for Visual Retrieval and Classification Mei-Chen Yeh* and Kwang-Ting Cheng Learning-Based Multimedia Lab Department of Electrical.

Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Efficient Image Search and Retrieval using Compact Binary Codes

Representative Previous Work

Enhancing Tensor Subspace Learning by Element Rearrangement

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Action recognition with improved trajectories

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.

Advanced Multimedia Image Content Analysis Tamara Berg.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.

Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

南台科技大學資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授：李育強報告者：楊智雁日期： 2010/04/19 Pattern Recognition Letters,

Lecture IX: Object Recognition (2)

Visual Event Recognition in Videos by Learning from Web Data

Learning Mid-Level Features For Recognition

Nonparametric Semantic Segmentation

Paper Presentation: Shape and Matching

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

CS 1674: Intro to Computer Vision Scene Recognition

Learning with information of features

بازیابی تصاویر بر اساس محتوا

SIFT keypoint detection

Delivered By: Yuelei Xie

Presented by Xu Miao April 20, 2005

Presentation transcript:

Problem Statement A pair of images or videos in which one is close to the exact duplicate of the other, but different in conditions related to capture, edits, and rendering. Problem: Spatial shift and scale variations

Tasks and Applications Near Duplicate Retrieval (NDR): –Copyright infringement detection –query-by-example application Near Duplicate Detection (NDD) –Link news stories and group them into threads –Filter out the redundant near duplicate images or videos in the top results from text keywords based web search

Prior Work Attributed Relational Graph (ARG) matching: ACM Multimedia 2004 Point set matching: ACM Multimedia 2004 One-to-one symmetric matching algorithm: T-MM 2007 and ACM Multimedia 2007 Large-scale near duplicate detection: CIVR 2007

Prior Work: Spatial pyramid match kernel  First quantize descriptors (SIFT) into words, then do one pyramid match per word in image coordinate space. Lazebnik, Schmid & Ponce, CVPR 2006

Fusion of information from different levels. Alignment of different subclips (Level-1 as an example) EMD Distance Matrix between Sub-clips Integer-value Alignment Smoke Fire Smoke Level-0 Level-1 Temporally Constrained Hierarchical Agglomerative Clustering Fire Temporal Pyramid Matching for Event Recognition in News Video Level-2 D. Xu & S.-F. Chang, CVPR 2007 and T-PAMI 2008

Earth Mover’s Distance (EMD) d ij Supplier P is with a given amount of goods Receiver Q is with a given limited capacity Weights: Solved by linear programming 1/m 1/2m 1/2m

Spatially Aligned Pyramid Matching Non-overlapped and overlapped partition at multiple-levels: Divide images into non-overlapped blocks Divide images into overlapped blocks with size equaling of the original image (in width and height) sampled at a fixed interval, say 1/8 of the image width and height.

First Stage Matching Objective: Compute the pairwise distances between any two blocks and. Solution: We represent each block as a bag of orderless SIFT descriptors and use EDM distance to measure the similarity between two sets of descriptors of unequal cardinality. Jianguo Zhang et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study, IJCV, 2007

Second Stage Matching (1) Objective: Align the blocks from one query image x to corresponding blocks in its near duplicate image y. SAPM (our work): One block may be matched to another block at a different position and/or scale level to robustly handle piecewise spatial translations and scale variations. SPM: fixed block-to-block matching

Second Stage Matching (2) Eq (3) can be reestablished from Eq (4): (Assume R<C) 1) adding C − R virtual blocks in image x, 2) setting, for all r satisfying R < r ≤ C. Solution: Integer-flow EMD

Comparison of SAPM, SPM and TPM Three blocks in the query images (i.e.,(a)) and their matched counterparts in near duplicate images (i.e.,(b), (c), (d)) are highlighted and associated by the same color outlines.

Third Stage Matching Extension to Near Duplicate Video Identification: One video clip V 1 comprises {x(1), x(2), …, x(M)}, where x(i) is the i-th frame and M is the total number of frames of V 1 ; Another video clip V 2 comprises {y(1), y(2),…, y(N)}, where N and y(j) are similarly defined. Solution: temporal matching with EMD again.

Discussion If the query image was divided into non- overlapped blocks (e.g., L2-N) and the corresponding database images were divided into overlapped blocks (e.g. L2-O) at the same level, spatial shifts and some degree of scale change are addressed (e.g., ) a broad range of scale variations is considered by matching the query image and the database images at different levels (e.g., ) Ideally, SAPM can deal with any variations from spatial shift and scale variation by using more denser scales and spatial spacings.

Near Duplicate Retrieval and Detection NDR: We directly fuse the distances from different levels : NDD: Generalized Neighborhood Component Analysis (GNCA) We use p = {0, 1, 2, 3, 4} to indicate partitions designated as level-0 non- Overlapped (L0-N), level-1 non-overlapped (L1-N), level-1 overlapped (L1-O), level-2 non- overlapped (L2-N), and level-2 overlapped (L2-O).

Experiments: Three Datasets Columbia Near Duplicate Image Database: TRECVID 2003 corpus New Image Dataset: TRECVID 2005 and 2006 corpus –150 near duplicate pairs (300 images) and 300 non-duplicate images New Video Dataset: TRECVID 2005 and 2006 corpus –50 near duplicate pairs (100 videos) and 200 non-duplicate videos The images are collected from real broadcast news (rather than edits of the same image by the authors).

Comparison of SAPM with SPM and TPM for Image NDR Columbia Database New Image Dataset

SAPM and GNCA for Image NDD Performance Measure: Equal Error Rate (EER) SAPM+NCA and SAPM+GNCA: 20 positive and 80 negative samples to train the projection matrices in NCA and GNCA, another 40 positive and 160 negative samples were used for SVM training. SPM, TPM and SAPM: all training samples (60 positive and 240 negative) were used for SVM training. Test samples: 90 (positive) and 4840 (negative).

Comparison of SAPM+TM, SPM+TM and TPM+TM for Video NDR 1: Single-level L0-N->L0-N; 2: Single-level L1-N ->L1-N (or L1-O); 3: Multi-level. Two weighting schemes in temporal matching: normalized weight (NW) and unit weight (UW)

Conclusion A multi-level spatial matching framework for image and video near duplicate identification. GNCA outperforms NCA for near duplicate detection.