Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13.

Slides:



Advertisements
Similar presentations
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Summary of Friday A homography transforms one 3d plane to another 3d plane, under perspective projections. Those planes can be camera imaging planes or.
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis Gunhee Kim 1 Antonio Torralba 2 1: SCS, CMU 2: CSAIL, MIT Neural Information.
Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
Patch to the Future: Unsupervised Visual Prediction
Stereo Matching Segment-based Belief Propagation Iolanthe II racing in Waitemata Harbour.
Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU.
Fast intersection kernel SVMs for Realtime Object Detection
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Support Vector Machines and Kernel Methods
Mean Shift A Robust Approach to Feature Space Analysis Kalyan Sunkavalli 04/29/2008 ES251R.
Locally Constraint Support Vector Clustering
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Segmentation Divide the image into segments. Each segment:
Reduced Support Vector Machine
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Computer vision.
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Mean-shift and its application for object tracking
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Mean Shift Theory and Applications Reporter: Zhongping Ji.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
EECS 274 Computer Vision Segmentation by Clustering II.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Region-Based Saliency Detection and Its Application in Object Recognition IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, VOL. 24 NO. 5,
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Application of spatial autocorrelation analysis in determining optimal classification method and detecting land cover change from remotely sensed data.
Defining Landscapes Forman and Godron (1986): A
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Image Segmentation Shengnan Wang
Recognition Using Visual Phrases
Mean Shift ; Theory and Applications Presented by: Reza Hemati دی 89 December گروه بینایی ماشین و پردازش تصویر Machine Vision and Image Processing.
Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Dimensionality reduction
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Introduction to Scale Space and Deep Structure. Importance of Scale Painting by Dali Objects exist at certain ranges of scale. It is not known a priory.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Lecture 30: Segmentation CS4670 / 5670: Computer Vision Noah Snavely From Sandlot ScienceSandlot Science.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
ADAPTIVE HIERARCHICAL CLASSIFICATION WITH LIMITED TRAINING DATA Dissertation Defense of Joseph Troy Morgan Committee: Dr Melba Crawford Dr J. Wesley Barnes.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
EE368 Final Project Spring 2003
Article Review Todd Hricik.
Nonparametric Semantic Segmentation
Video Google: Text Retrieval Approach to Object Matching in Videos
Recognition - III.
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
Categorization by Learning and Combing Object Parts
Grouping/Segmentation
Video Google: Text Retrieval Approach to Object Matching in Videos
Deblurring Shaken and Partially Saturated Images
Presentation transcript:

Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13

Main Idea What are discriminative features exactly, and how can we find them automatically? Discriminative features are local maxima in the feature distribution of positive/negative examples, p + (x)/p - (x)

Mean Shift

Mean Shift – 1 st Issue HOG distances vary significantly across feature space, different bandwidths are needed in different regions

Mean Shift – 2 nd Issue We actually have labeled data, and we want to find maxima in p + (x)/p - (x)

Mean Shift Mean shift using flat kernel and bandwidth ‘b’ converges to maxima of KDE using triangular kernel: b = 1b = 0.1b = 0.5

Mean Shift Reformulation Take ratio of the KDEs for positive/negative patches, and use adaptive bandwidth: Make denominator constant to adapt bandwidth: Use normalized correlation rather than triangular kernel:

Inter-Element Communication In practice, doing ‘m’ different runs starting from ‘m’ initializations. We can phrase this as a joint optimization: α i,j controls how much patch ‘i’ contributes to run ‘j’ First, cluster the paths based on inlying patches, then add competition between paths in different clusters Intuition, very similar paths will still go to same mode, but other paths will be repelled

Cluster 1 Cluster 2 Elements near the Cluster 1 paths will be downweighted heavily for the Cluster 2 path and vice versa, preventing Cluster 2 from drifting toward more dominant mode No competition occurs between the two Cluster 1 paths In practice, calculated a per pixel quantity and averaged over patch:

No inter-element communication With inter-element communication

Purity Coverage Plot Given a trained element, run patch detection on a hold-out set with some threshold Purity: % of detections from positive images Coverage: % of pixels covered in positive images by union of all patches Given many elements, set each threshold so all have same purity, then pick N elements greedily to maximize total coverage Ideally, resulting elements will be discriminative/representative

Discriminative mode seeking finds better elements than previous methods Purity Coverage Plot

Classification Used MIT Scene 67 dataset, learned 200 elements per class using discriminative mode seeking and PC-plots Then computed BoP feature vectors with elements and trained 67 one vs. all linear SVMs for classification 13,400 elements 2 level spatial pyramid (1x1, 2x2) Top detection in each region 67,000 elements … …

Conclusion Defined discriminative elements as maxima in the ratio between positive/negative distributions Adapted mean shift algorithm to find the maxima in these distributions Introduced PC plots to choose best elements out of many Achieved state-of-the-art accuracy on MIT Scene 67 dataset