Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.

Histograms of Oriented Gradients for Human Detection

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.

Lecture 31: Modern object recognition

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Patch Descriptors CSE P 576 Larry Zitnick

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

Distinctive Image Feature from Scale-Invariant KeyPoints

Feature extraction: Corners and blobs

Object Detection using Histograms of Oriented Gradients

Scale Invariant Feature Transform (SIFT)

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

Spatial Pyramid Pooling in Deep Convolutional

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

CS4670: Computer Vision Kavita Bala Lecture 8: Scale invariance.

Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Generic object detection with deformable part-based models

Computer vision.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Lecture 4: Feature matching CS4670 / 5670: Computer Vision Noah Snavely.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Visual Object Recognition

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

COMP322/S2000/L171 Robot Vision System Major Phases in Robot Vision Systems: A. Data (image) acquisition –Illumination, i.e. lighting consideration –Lenses,

Histograms of Oriented Gradients for Human Detection(HOG)

Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Local features: detection and description

776 Computer Vision Jan-Michael Frahm Spring 2012.

More sliding window detection: Discriminative part-based models

Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.

WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人：蒲薇榄.

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.

Cascade for Fast Detection

CS262: Computer Vision Lect 09: SIFT Descriptors

Lecture 13: Feature Descriptors and Matching

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

TP12 - Local features: detection and description

Lit part of blue dress and shadowed part of white dress are the same color

Local features: detection and description May 11th, 2017

Feature description and matching

Object detection as supervised classification

Introduction of Pedestrian Detection

A Tutorial on HOG Human Detection

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

From a presentation by Jimmy Huff Modified by Josiah Yoder

Brief Review of Recognition + Context

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

An HOG-LBP Human Detector with Partial Occlusion Handling

Feature descriptors and matching

Presented by Xu Miao April 20, 2005

Presentation transcript:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented Gradients Dr. Edgar Seemann

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 2 Discriminative vs. generative models  Generative:  + possibly interpretable  + models the object class/can draw samples  - model variability unimportant to classification task  - often hard to build good model with few parameters  Discriminative:  + appealing when infeasible to model data itself  + currently often excel in practice  - often can’t provide uncertainty in predictions  - non-interpretable 2 K. Grauman, B. Leibe

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 3 Global vs. Part-Based  We distinguish global people detectors and part- based detectors  Global approaches:  A single feature description for the complete person  Part-Based Approaches:  Individual feature descriptors for body parts / local parts

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 4 Advantages and Disadvantages  Part-Based  May be better able to deal with moving body parts  May be able to handle occlusion, overlaps  Requires more complex reasoning  Global approaches  Typically simple, i.e. we train a discriminative classifier on top of the feature descriptions  Work well for small resolutions  Typically does detection via classification, i.e. uses a binary classifier

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 5 Detection via classification: Main idea Car/non-car Classifier Yes, car.No, not a car. Slide credit: K. Grauman, B. Leibe Basic component: a binary classifier

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 6 Detection via classification: Main idea Car/non-car Classifier Slide credit: K. Grauman, B. Leibe If object may be in a cluttered scene, slide a window around looking for it.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 7 Gradient Histograms

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 8 Gradient Histograms  Have become extremely popular and successful in the vision community  Avoid hard decisions compared to edge based features  Examples:  SIFT (Scale-Invariant Image Transform)  GLOH (Gradient Location and Orientation Histogram)  HOG (Histogram of Oriented Gradients)

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 9 Computing gradients  One sided:  Two sided:  Filter masks in x-direction  One sided:  Two sided:  Gradient:  Magnitude:  Orientation: 01 1

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 10 Histograms  Gradient histograms measure the orientations and strengths of image gradients within an image region

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 11 Example: SIFT descriptor  The most popular gradient-based descriptor  Typically used in combination with an interest point detector  Region rescaled to a grid of 16x16 pixels  4x4 regions = 16 histograms (concatenated)  Histograms: 8 orientation bins, gradients weighted by gradient magnitude  Final descriptor has 128 dimensions and is normalized to compensate for illumination differences

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 12 Application: AutoPano-Sift Sift matches Blended image Other applications: - Recognition of previously seen objects (e.g. in robotics)

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 13 Histograms of Oriented Gradients  Gradient-based feature descriptor developed for people detection  Authors: Dalal&Triggs (INRIA Grenoble, F)  Global descriptor for the complete body  Very high-dimensional  Typically ~4000 dimensions

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 14 HOG Very promising results on challenging data sets Phases 1. Learning Phase 2. Detection Phase

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 15 Detector: Learning Phase 1.Learning Set of cropped images containing pedestrians in normal environment Global descriptor rather than local features Using linear SVM

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 16 Detector: Detection Phase 2.Detection Sliding window over each scale Simple SVM prediction

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 17 Descriptor 1.Compute gradients on an image region of 64x128 pixels 2.Compute histograms on ‘cells’ of typically 8x8 pixels (i.e. 8x16 cells) 3.Normalize histograms within overlapping blocks of cells (typically 2x2 cells, i.e. 7x15 blocks) 4.Concatenate histograms

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 18 Gradients  Convolution with [-1 0 1] filters  No smoothing  Compute gradient magnitude+direction  Per pixel: color channel with greatest magnitude - > final gradient

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 19 Cell histograms  9 bins for gradient orientations (0-180 degrees)  Filled with magnitudes  Interpolated trilinearly:  Bilinearly into spatial cells  Linearly into orientation bins

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 20 Linear and Bilinear interpolation for subsampling Linear: Bilinear:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 21 Histogram interpolation example  θ=85 degrees  Distance to bin centers  Bin 70 -> 15 degrees  Bin 90 -> 5 degress  Ratios: 5/20=1/4, 15/20=3/4  Distance to bin centers  Left: 2, Right: 6  Top: 2, Bottom: 6  Ratio Left-Right: 6/8, 2/8  Ratio Top-Bottom: 6/8, 2/8  Ratios:  6/8*6/8 = 36/64 = 9/16  6/8*2/8 = 12/64 = 3/16  2/8*6/8 = 12/64 = 3/16  2/8*2/8 = 4/64 = 1/16

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 22 Blocks  Overlapping blocks of 2x2 cells  Cell histograms are concatenated and then normalized  Note that each cell several occurrences with different normalization in final descriptor  Normalization  Different norms possible (L2, L2hys etc.)  We add a normalization epsilon to avoid division by zero

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 23 Blocks  Gradient magnitudes are weighted according to a Gaussian spatial window  Distant gradients contribute less to the histogram

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 24 Final Descriptor  Concatenation of Blocks  Visualization:

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 25 Engineering  Developing a feature descriptor requires a lot of engineering  Testing of parameters (e.g. size of cells, blocks, number of cells in a block, size of overlap)  Normalization schemes (e.g. L1, L2-Norms etc., gamma correction, pixel intensity normalization)  An extensive evaluation of different choices was performed, when the descriptor was proposed  It’s not only the idea, but also the engineering effort

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 26 Training Set  More than 2000 positive & 2000 negative training images (96x160px)  Carefully aligned and resized  Wide variety of backgrounds

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 27 Model learning  Simple linear SVM on top of the HOG Features  Fast (one inner product per evaluation window)  Hyper plane normal vector: with y i in {0,1} and x i the support vectors  Decision:  Slightly better results can be achieved by using a SVM with a Gaussian kernel  But considerable increase in computation time

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 28 Result on INRIA database  Test Set contains 287 images  Resolution ~640x480  589 persons  Avg. size: 288 pixels

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 29 Demo