Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Slides:



Advertisements
Similar presentations
Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Advertisements

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
An Interactive-Voting Based Map Matching Algorithm
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
What makes an image memorable?
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Recognition: A machine learning approach
Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Visual Object Recognition Rob Fergus Courant Institute, New York University
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Generic object detection with deformable part-based models
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin CVPR 2010.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
Object Detection Sliding Window Based Approach Context Helps
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Face detection Slides adapted Grauman & Liebe’s tutorial
Efficient Region Search for Object Detection Sudheendra Vijayanarasimhan and Kristen Grauman Department of Computer Science, University of Texas at Austin.
Object Detection with Discriminatively Trained Part Based Models
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
Object detection, deep learning, and R-CNNs
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Recognition Using Visual Phrases
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Object detection with deformable part-based models
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Accounting for the relative importance of objects in image retrieval
Context-Aware Modeling and Recognition of Activities in Video
“The Truth About Cats And Dogs”
On-going research on Object Detection *Some modification after seminar
Human-object interaction
Presentation transcript:

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan Li Ievgeniia Gutenko

Baby Infant Kid Child Headphones Red Cute Laughing Boy Dog Grass Blue Sky Puppy River Stream Sun Colorado Nikon

Weakly labeled images Lamp Chair Painting Table Lamp Chair Baby Table Chair Bicycle Person

Object detection approaches 0 Sliding window object detector. 0 Reduce the number of windows scanned. 0 Improve accuracy based on Prime detector based on context CascadesBranch-and-bound - inter-object co- occurrence (occlusions) - spatial relationships (on, to the left, to the right)

Object detection approaches 0 Prioritize search windows within image, based on learned distribution of tags for speed. 0 Combine the models based on both tags + images for accuracy. Sliding window object detector Need to reduce the # of windows scanned Appearance-based detector

Motivation Idea: What can be predicted from the image before even looking at it and only with given tags? Both sets of tags suggest that mug appears on the image, but when considering that set of tags is based on what “catches they eye” first, then the area that object detector has to search can be narrowed.

Implicit Tag Feature Definitions 0 What implicit features can be obtained from tags? 0 Relative prominence of each object based on the order in the list. 0 Scale cues implied by unnamed objects. 0 The rough layout and proximity between objects based on the sequence in which tags are given.

Implicit Tag Feature Definitions 0 Word presence and absence – bag-of-words representation 0 w i denotes the number of times that tag-word i occurs in that image’s associated list of keywords for a vocabulary of N total possible words 0 For most tag lists, this vector will consist of only binary entries saying whether each tag has been named or not

Implicit Tag Feature Definitions 0 Tag rank – prominence of each object: certain things will be named before others 0 r i denotes the percentile ranks observed in the training data for that word (for entire vocabulary) 0 Some objects have context-independent “noticeability”—such as baby or fire truck—often named first regardless of their scale or position.

Implicit Tag Feature Definitions 0 Mutual tag proximity - tagger will name prominent objects first, then move his/her eyes to some other objects nearby 0 p i,j denotes the (signed) rank difference between tag words i and j for the given image. 0 The entry is 0 when the pair is not present.

Modeling the localization distributions 0 Relate defined tag-based features to the object detection (or combination) 0 Model conditional probability density that the window contains the object of interest, given only these image tags: 0 - the target object category.

Modeling the localization distributions 0 Use mixture of Gaussians model: 0 - parameters of the mixture model obtained by trained Mixture Density Network (MDN) 0 Training:Classification: Novel image with no BBoxes. Computer Bicycle Chair MDN provides the mixture model representing most likely locations for the target object.

The top 30 most likely places for a car sampled according to modeled distribution based only on tags of the images.

Modulating or Priming the detector 0 Use from the previous step and: 0 Combine with predictions with object detector based on appearance, A – appearance cues: HOG: Part-based detector (deformable part model) 0 Use the model to rank sub-windows and run the detector on most probable locations only (“priming”). 0 Decision value of detectors is mapped to probability:

Modulating the detector 0 Balance appearance and tag-based predictions: 0 Use all tags cues: 0 Learn the weights w using detection scores for true detections and a number of randomly sampled windows from the background. 0 Can add Gist descriptor to compare against global scene visual context. 0 Goal: improve accuracy.

Priming the detector 0 Prioritize the search windows according to 0 Assumption that object is present, and only localization parameters (x,y,s) have to be estimated. 0 Stop search when confident detection is found 0 Confidence ( >0.5) 0 Goal: improve efficiency.

Results 0 Datasets 0 LabelMe - use the HOG detector 0 PASCAL- use the part-based detector Note: Last three columns show the ranges of positions/scales present in the images, averaged per class, as a percentage of image size. LPLP

LabelMe Dataset Priming Object Search: Increasing Speed For a detection rate of 0.6, proposed method considers only 1/3 of those scanned by the sliding window approach. Modulating the Detector: Increasing Accuracy The proposed features make noticeable improvements in accuracy over the raw detector.

Example detections on LabelMe Each image shows the best detection found. Scores denote overlap ratio with ground truth. The detectors modulated according to the visual or tag-based context are more accurate.

PASCAL Dataset 0 Priming Object Search: Increasing Speed Adopt the Latent SVM (LSVM) part-based windowed detector, faster here than the HOG’s was on LabelMe. 0 Modulating the Detector: Increasing Accuracy Augmenting the LSVM detector with the tag features noticeably improves accuracy—increasing the average precision by 9.2% overall.

Example detections on PASCAL VOC 0 Red dotted boxes denote most confident detections according to the raw detector (LSVM) 0 Green solid boxes denote most confident detections when modulated by our method (LSVM + tags) 0 The first two rows show good results, and third row shows failure cases

Conclusions 0 Novel approach to use information “between the lines” of tags. 0 Utilizing this implicit tag information helps to make search faster and more accurate. 0 The method complements and even exceeds performance of the methods using visual cues. 0 Shows potential for learning tendencies of real taggers.

Thank you!