Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.

Slides:



Advertisements
Similar presentations
Limin Wang, Yu Qiao, and Xiaoou Tang
Advertisements

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
High-level Component Filtering for Robust Scene Text Detection
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
CS292 Computational Vision and Language Pattern Recognition and Classification.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
ACM Multimedia th Annual Conference, October , 2004
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Spatial Pyramid Pooling in Deep Convolutional
Student: Kylie Gorman Mentor: Yang Zhang COLOR-ATTRIBUTES- RELATED IMAGE RETRIEVAL.
Generic object detection with deformable part-based models
Computer vision.
Object Detection Sliding Window Based Approach Context Helps
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Implementing GIST on the GPU. Refrence Original Work  Aude Oliva, Antonio Torralba  Modeling the shape of the scene: a holistic representation of the.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Training and Evaluating of Object Bank Models Presenter : Changyu Liu Advisor : Prof. Alex Interest : Multimedia Analysis May 16 th, 2013.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Recognition Using Visual Phrases
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Object Recognizing. Object Classes Individual Recognition.
Presented by David Lee 3/20/2006
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
More sliding window detection: Discriminative part-based models
Object Recognizing. Object Classes Individual Recognition.
 Mentor : Prof. Amitabha Mukerjee Learning to Detect Salient Objects Team Members - Avinash Koyya Diwakar Chauhan.
A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Another Example: Circle Detection
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Learning Mid-Level Features For Recognition
Performance of Computer Vision
Recognizing Deformable Shapes
Nonparametric Semantic Segmentation
Object detection as supervised classification
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Computer Vision James Hays
Text Detection in Images and Video
Rob Fergus Computer Vision
Object Detection + Deep Learning
Objects as Attributes for Scene Classification
Outline Background Motivation Proposed Model Experimental Results
Heterogeneous convolutional neural networks for visual recognition
Object Detection Implementations
Presentation transcript:

Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013

CMU - Language Technologies Institute 2 Contents Introduction Model Algorithm Experiment Conclusion

CMU - Language Technologies Institute 3 1. Research Question 1) Understanding the meanings and contents of images remains one of the most challenging problems in machine intelligence and statistical learning. 2) Also present low-level image features are enough for a variety of visual recognition tasks, but still not enough, especial for the visual tasks which carry semantic meanings. So efficient high level image features are often needed. Introduction

CMU - Language Technologies Institute 4 2. What’s Object Bank? Object bank representation is a novel image representation for high-level visual tasks, which encodes semantic and spatial information of the objects within an image. In object bank, an image is represented as a collection of scale-invariant response maps of a large number of pre-trained generic object detectors. Introduction

CMU - Language Technologies Institute 5 3. Why to use it? Fig.1 illustrates the gradient-based GIST features and texture-based Spatial Pyramid representation of two different scenes (foresty mountain vs. street). But such schemes often fail to offer sufficient discriminative power, as one can see from the very similar image statistics in the examples in this figure. Introduction Fig. 1 : (Best viewed in colors and magnification.) Comparison of OB) representation with GIST and SIFT-SPM of mountain vs. city street..

CMU - Language Technologies Institute 6 4. What is it used for? The main goal we want to use object bank is: 1) Optimize object bank detection code 2) Extend object banks to incorporate more objects. Introduction

CMU - Language Technologies Institute 7 Contents Introduction Model Algorithm Experiment Conclusion

CMU - Language Technologies Institute 8 Fig. 2 Object Bank Model Model ---Object Bank A large number of object detectors are first applied to an input image at multiple scales. For each object at each scale, a three-level spatial pyramid representation of the resulting object filter map is used; the maximum response for each object in each grid is then computed, resulting in a No:Objects length feature vector for each grid. A concatenation of features in all grids leads to an OB descriptor for the image.

CMU - Language Technologies Institute 9 Contents Introduction Model Algorithm Experiment Conclusion

CMU - Language Technologies Institute 10 1)Let represent the design built on the J-dimensional object bank representation of N images; 2) Let denote the binary classification labels of N samples. 3) This leads to the following learning problem: (1) Where is some non-negative, convex loss is a regularizer that avoids overfitting. Algorithm According Paper [1]Object Bank Algorithm is as follows:

CMU - Language Technologies Institute 11 Contents Introduction Model Algorithm Experiment Conclusion

CMU - Language Technologies Institute 12 We want to extend the original Object Bank approach, and do some related experiments as follow steps: 1) List and number the needed objects in our experiment, as: Object Names: clock goggles spectacles knife key keyboard desktop computer computer 1074-dog printer faucet ……………………… Experiment

CMU - Language Technologies Institute 13 2) Download the related bounding box from image-net. 3) Resizing the original image: The image is resized using the following process. First get the image dimensions(i.e. (a,b)). The ratio for scaling is calculated, using the following: Ratio=400/min(a,b); Therefore, the smaller axe of the image is converted to 400 pixels. This example illustrates that: Experiment Fig. 3 Resizing Step

CMU - Language Technologies Institute 14 4) Getting HOG features at different scales: After this rescaling, HOG features are obtained using different scales of the image. Although, they obtain HOG features for more scales, they only use six of these scales. These are the ratios used for resizing the image(previously resized in the previous step) Ratios: 1(image obtained from the previous step) Experiment

CMU - Language Technologies Institute 15 After resizing the images, then calculate the HOG features for every image. These HOG features are used to obtain the response for every object. Example of HOG features are calculated for one image: Experiment Fig. 4 Example of HOG feature

CMU - Language Technologies Institute 16 5) Getting the response for the object): After getting the HOG features, we apply a object specific filter to these features. Each root filter, has two different components. Each of these components works on a different scale. As a result, we have 12 different detection scales because we obtained had 6 different scales from the previous steps, and every filter works at 2 different scales. Consequently, 6*2=12. These filter responses, are stored in a matrixes following the same distribution as the HOG feature in the image d of the previous figure. Then, for the HOG obtained from each ratio we have two different filter responses. Namely, we have 12 HOG responses. Experiment

CMU - Language Technologies Institute 17 6) Getting the spatial pyramids: We have 3 different spatial pyramid levels: These three spatial pyramid levels are applied to the 12 different responses to one object. In order to select the value for each box, they select the maximum response of the filter for every box. For instance, for the second level, they split the filter response using a grid of 2x2; They pick the maximum response inside every box. As you can see, we have 21( 1 + 2*2 + 4*4) values for every one of the 12 filter responses for one object. Resulting in a total of 12*21= 252 dimensions vector for every object. Experiment

CMU - Language Technologies Institute 18 7) Getting the feature vector for one object): Now, I am going to describe the distribution of the feature vector for one object. We have a vector of 252 dimensions. We start for the diferent scales(Remember that our original image, is the one obtained in the first step) Experiment

CMU - Language Technologies Institute 19 Each one of the chunk for every scale is divided in two pieces, because the used root filter in object bank has two different components that work at different scales. So these 42 dimensions for every scale are splited in two pieces of 21 dimensions. Experiment

CMU - Language Technologies Institute 20 Finally, This is the distribution of the 21 dimensions. Experiment

CMU - Language Technologies Institute 21 Contents Introduction Model Algorithm Experiment Conclusion

CMU - Language Technologies Institute 22 1)It is a feasible method to use. The author used several experiments to demonstrate that Object Bank representation that carries rich semantic level image information is more powerful on scene classification tasks than some other popular methods. 2)We could use and extend this approach according the real situation to do the remain experiment in the near future. Conclusion

CMU - Language Technologies Institute 23 Reference [1] Level Image Representation for Scene Classification and Semantic Feature Sparsification. Proceedings of the Neural Information Processing Systems (NIPS), [2] Li-Jia Li, Hao Su, Yongwhan Lim and Li Fei-Fei. Objects as Attributes for Scene Classification. Proceedings of the 12th European Conference of Computer Vision (ECCV), 1st International Workshop on Parts and Attributes, [3] Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video. CVPR, [4]Pedro Felzenszwalb, et. al. A Discriminatively Trained, Multiscale, Deformable Part Model. CVPR, 2008.

CMU - Language Technologies Institute 24 Thank you!