Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.
Histograms of Oriented Gradients for Human Detection
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Lecture 31: Modern object recognition
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Patch Descriptors CSE P 576 Larry Zitnick
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature extraction: Corners and blobs
Object Detection using Histograms of Oriented Gradients
Scale Invariant Feature Transform (SIFT)
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
Spatial Pyramid Pooling in Deep Convolutional
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
CS4670: Computer Vision Kavita Bala Lecture 8: Scale invariance.
Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Generic object detection with deformable part-based models
Computer vision.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Lecture 4: Feature matching CS4670 / 5670: Computer Vision Noah Snavely.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Visual Object Recognition
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
COMP322/S2000/L171 Robot Vision System Major Phases in Robot Vision Systems: A. Data (image) acquisition –Illumination, i.e. lighting consideration –Lenses,
Histograms of Oriented Gradients for Human Detection(HOG)
Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Local features: detection and description
776 Computer Vision Jan-Michael Frahm Spring 2012.
More sliding window detection: Discriminative part-based models
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Cascade for Fast Detection
CS262: Computer Vision Lect 09: SIFT Descriptors
Lecture 13: Feature Descriptors and Matching
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
TP12 - Local features: detection and description
Lit part of blue dress and shadowed part of white dress are the same color
Local features: detection and description May 11th, 2017
Feature description and matching
Object detection as supervised classification
Introduction of Pedestrian Detection
A Tutorial on HOG Human Detection
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
From a presentation by Jimmy Huff Modified by Josiah Yoder
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
An HOG-LBP Human Detector with Partial Occlusion Handling
Feature descriptors and matching
Presented by Xu Miao April 20, 2005
Presentation transcript:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented Gradients Dr. Edgar Seemann

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 2 Discriminative vs. generative models  Generative:  + possibly interpretable  + models the object class/can draw samples  - model variability unimportant to classification task  - often hard to build good model with few parameters  Discriminative:  + appealing when infeasible to model data itself  + currently often excel in practice  - often can’t provide uncertainty in predictions  - non-interpretable 2 K. Grauman, B. Leibe

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 3 Global vs. Part-Based  We distinguish global people detectors and part- based detectors  Global approaches:  A single feature description for the complete person  Part-Based Approaches:  Individual feature descriptors for body parts / local parts

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 4 Advantages and Disadvantages  Part-Based  May be better able to deal with moving body parts  May be able to handle occlusion, overlaps  Requires more complex reasoning  Global approaches  Typically simple, i.e. we train a discriminative classifier on top of the feature descriptions  Work well for small resolutions  Typically does detection via classification, i.e. uses a binary classifier

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 5 Detection via classification: Main idea Car/non-car Classifier Yes, car.No, not a car. Slide credit: K. Grauman, B. Leibe Basic component: a binary classifier

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 6 Detection via classification: Main idea Car/non-car Classifier Slide credit: K. Grauman, B. Leibe If object may be in a cluttered scene, slide a window around looking for it.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 7 Gradient Histograms

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 8 Gradient Histograms  Have become extremely popular and successful in the vision community  Avoid hard decisions compared to edge based features  Examples:  SIFT (Scale-Invariant Image Transform)  GLOH (Gradient Location and Orientation Histogram)  HOG (Histogram of Oriented Gradients)

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 9 Computing gradients  One sided:  Two sided:  Filter masks in x-direction  One sided:  Two sided:  Gradient:  Magnitude:  Orientation: 01 1

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 10 Histograms  Gradient histograms measure the orientations and strengths of image gradients within an image region

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 11 Example: SIFT descriptor  The most popular gradient-based descriptor  Typically used in combination with an interest point detector  Region rescaled to a grid of 16x16 pixels  4x4 regions = 16 histograms (concatenated)  Histograms: 8 orientation bins, gradients weighted by gradient magnitude  Final descriptor has 128 dimensions and is normalized to compensate for illumination differences

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 12 Application: AutoPano-Sift Sift matches Blended image Other applications: - Recognition of previously seen objects (e.g. in robotics)

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 13 Histograms of Oriented Gradients  Gradient-based feature descriptor developed for people detection  Authors: Dalal&Triggs (INRIA Grenoble, F)  Global descriptor for the complete body  Very high-dimensional  Typically ~4000 dimensions

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 14 HOG Very promising results on challenging data sets Phases 1. Learning Phase 2. Detection Phase

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 15 Detector: Learning Phase 1.Learning Set of cropped images containing pedestrians in normal environment Global descriptor rather than local features Using linear SVM

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 16 Detector: Detection Phase 2.Detection Sliding window over each scale Simple SVM prediction

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 17 Descriptor 1.Compute gradients on an image region of 64x128 pixels 2.Compute histograms on ‘cells’ of typically 8x8 pixels (i.e. 8x16 cells) 3.Normalize histograms within overlapping blocks of cells (typically 2x2 cells, i.e. 7x15 blocks) 4.Concatenate histograms

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 18 Gradients  Convolution with [-1 0 1] filters  No smoothing  Compute gradient magnitude+direction  Per pixel: color channel with greatest magnitude - > final gradient

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 19 Cell histograms  9 bins for gradient orientations (0-180 degrees)  Filled with magnitudes  Interpolated trilinearly:  Bilinearly into spatial cells  Linearly into orientation bins

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 20 Linear and Bilinear interpolation for subsampling Linear: Bilinear:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 21 Histogram interpolation example  θ=85 degrees  Distance to bin centers  Bin 70 -> 15 degrees  Bin 90 -> 5 degress  Ratios: 5/20=1/4, 15/20=3/4  Distance to bin centers  Left: 2, Right: 6  Top: 2, Bottom: 6  Ratio Left-Right: 6/8, 2/8  Ratio Top-Bottom: 6/8, 2/8  Ratios:  6/8*6/8 = 36/64 = 9/16  6/8*2/8 = 12/64 = 3/16  2/8*6/8 = 12/64 = 3/16  2/8*2/8 = 4/64 = 1/16

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 22 Blocks  Overlapping blocks of 2x2 cells  Cell histograms are concatenated and then normalized  Note that each cell several occurrences with different normalization in final descriptor  Normalization  Different norms possible (L2, L2hys etc.)  We add a normalization epsilon to avoid division by zero

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 23 Blocks  Gradient magnitudes are weighted according to a Gaussian spatial window  Distant gradients contribute less to the histogram

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 24 Final Descriptor  Concatenation of Blocks  Visualization:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 25 Engineering  Developing a feature descriptor requires a lot of engineering  Testing of parameters (e.g. size of cells, blocks, number of cells in a block, size of overlap)  Normalization schemes (e.g. L1, L2-Norms etc., gamma correction, pixel intensity normalization)  An extensive evaluation of different choices was performed, when the descriptor was proposed  It’s not only the idea, but also the engineering effort

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 26 Training Set  More than 2000 positive & 2000 negative training images (96x160px)  Carefully aligned and resized  Wide variety of backgrounds

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 27 Model learning  Simple linear SVM on top of the HOG Features  Fast (one inner product per evaluation window)  Hyper plane normal vector: with y i in {0,1} and x i the support vectors  Decision:  Slightly better results can be achieved by using a SVM with a Gaussian kernel  But considerable increase in computation time

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 28 Result on INRIA database  Test Set contains 287 images  Resolution ~640x480  589 persons  Avg. size: 288 pixels

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 29 Demo