Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar.

Similar presentations


Presentation on theme: "Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar."— Presentation transcript:

1 Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar

2 What you need to know Dual SVM formulation – How it’s derived The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression SVMs vs kernel regression SVMs vs logistic regression

3 Example: Dalal-Triggs pedestrian detector 1.Extract fixed-sized (64x128 pixel) window at each position and scale 2.Compute HOG (histogram of gradient) features within each window 3.Score the window with a linear SVM classifier 4.Perform non-maxima suppression to remove overlapping detections with lower scores Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

4 Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

5 Tested with – RGB – LAB – Grayscale Slightly better performance vs. grayscale

6 uncentered centered cubic-corrected diagonal Sobel Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 Outperforms

7 Histogram of gradient orientations – Votes weighted by magnitude – Bilinear interpolation between cells Orientation: 9 bins (for unsigned angles) Histograms in 8x8 pixel cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

8 Normalize with respect to surrounding cells Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

9 X= Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 # features = 15 x 7 x 9 x 4 = 3780 # cells # orientations # normalizations by neighboring cells

10 Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05 pos w neg w

11 pedestrian Slides by Pete Barnum Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection, CVPR05

12 Detection examples

13 Viola-Jones sliding window detector Fast detection through two mechanisms Quickly eliminate unlikely windows Use features that are fast to compute Viola and Jones. Rapid Object Detection using a Boosted Cascade of Simple Features (2001).Rapid Object Detection using a Boosted Cascade of Simple Features

14 Cascade for Fast Detection Examples Stage 1 H 1 (x) > t 1 ? Reject No Yes Stage 2 H 2 (x) > t 2 ? Stage N H N (x) > t N ? Yes … Pass Reject No Reject No Choose threshold for low false negative rate Fast classifiers early in cascade Slow classifiers later, but most examples don’t get there

15 Features that are fast to compute “Haar-like features” – Differences of sums of intensity – Thousands, computed at various positions and scales within detection window Two-rectangle featuresThree-rectangle featuresEtc. +1

16 Feature selection with Adaboost Create a large pool of features (180K) Select features that are discriminative and work well together – “Weak learner” = feature + threshold – Choose weak learner that minimizes error on the weighted training set – Reweight

17 Top 2 selected features

18 Viola Jones Results MIT + CMU face dataset Speed = 15 FPS (in 2001)

19 What about pose estimation?

20 What about interactions?

21 3D modeling

22 Object context From Divvala et al. CVPR 2009

23 Integration Feature level Margin Based – Max margin Structure Learning Probabilistic – Graphical Models

24 Integration Feature level Margin Based – Max margin Structure Learning Probabilistic – Graphical Models

25 Feature Passing Compute features from one estimated scene property to help estimate another Image X Estimate Y Estimate X Features Y Features

26 Feature passing: example Object Window Below Above Use features computed from “geometric context” confidence images to improve object detection Hoiem et al. ICCV 2005 Features: average confidence within each window

27 Scene Understanding 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Recognition using Visual Phrases, CVPR 2011

28 Feature Design Above Beside Below Recognition using Visual Phrases, CVPR 2011

29 Feature Passing Pros and cons – Simple training and inference – Very flexible in modeling interactions – Not modular if we get a new method for first estimates, we may need to retrain

30 Integration Feature Passing Margin Based – Max margin Structure Learning Probabilistic – Graphical Models

31 Structured Prediction Prediction of complex outputs – Structured outputs: multivariate, correlated, constrained Novel, general way to solve many learning problems

32 Structure 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Recognition using Visual Phrases, CVPR 2011

33 Handwriting Recognition brace Sequential structure xy

34 Object Segmentation Spatial structure xy

35 Scene Parsing Recursive structure

36 Bipartite Matching What is the anticipated cost of collecting fees under the new proposal? En vertu des nouvelles propositions, quel est le coût prévu de perception des droits? xy What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions, quel est le coût prévu de perception de les droits ? Combinatorial structure

37 Local Prediction Classify using local information  Ignores correlations & constraints! breac

38 Local Prediction building tree shrub ground

39 Structured Prediction Use local information Exploit correlations breac

40 Structured Prediction building tree shrub ground

41 Structured Models Mild assumptions: linear combination sum of part scores space of feasible outputs scoring function

42 Supervised Structured Prediction Learning Prediction Estimate w Example: Weighted matching Generally: Combinatorial optimization Data Model: Likelihood (can be intractable) MarginLocal (ignores structure)

43 Local Estimation Treat edges as independent decisions Estimate w locally, use globally – E.g., naïve Bayes, SVM, logistic regression – Cf. [Matusov+al, 03] for matchings – Simple and cheap – Not well-calibrated for matching model – Ignores correlations & constraints Data Model:

44 Conditional Likelihood Estimation Estimate w jointly: Denominator is #P-complete [Valiant 79, Jerrum & Sinclair 93] Tractable model, intractable learning Need tractable learning method  margin-based estimation Data Model:

45 We want: Equivalently: Structured large margin estimation a lot! … “ brace ” “ aaaaa ” “ brace ”“ aaaab ” “ brace ”“ zzzzz ”

46 Structured Loss b c a r e b r o r e b r o c e b r a c e 2 2 1 0

47 Large margin estimation Given training examples, we want: Maximize margin Mistake weighted margin: # of mistakes in y *Collins 02, Altun et al 03, Taskar 03

48 Large margin estimation Eliminate Add slacks for inseparable case (hinge loss)

49 Large margin estimation Brute force enumeration Min-max formulation –‘ Plug-in ’ linear program for inference

50 Min-max formulation LP Inference Structured loss (Hamming): Inference discrete optim. Key step: continuous optim.

51 Matching Inference LP degree What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions, quel est le coût prévu de perception de le droits ? j k Need Hamming-like loss

52 LP Duality Linear programming duality – Variables  constraints – Constraints  variables Optimal values are the same – When both feasible regions are bounded

53 Min-max Formulation LP duality

54 Min-max formulation summary *Taskar et al 04


Download ppt "Crash Course on Machine Learning Part IV Several slides from Derek Hoiem, and Ben Taskar."

Similar presentations


Ads by Google