“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Slides:



Advertisements
Similar presentations
Statistical Learning of Multi-View Face Detection
Advertisements

ICML Linear Programming Boosting for Uneven Datasets Jurij Leskovec, Jožef Stefan Institute, Slovenia John Shawe-Taylor, Royal Holloway University.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
CMPUT 466/551 Principal Source: CMU
AdaBoost & Its Applications
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
The Viola/Jones Face Detector (2001)
HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,
Copyright © Siemens Medical Solutions, USA, Inc.; All rights reserved. Polyhedral Classifier for Target Detection A Case Study: Colorectal Cancer.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Learning and Vision: Discriminative Models
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS Presentation
Viewpoint Tracking for 3D Display Systems A look at the system proposed by Yusuf Bediz, Gözde Bozdağı Akar.
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.
Kullback-Leibler Boosting Ce Liu, Hueng-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek Hoiem.
Face Detection using the Viola-Jones Method
Using Statistic-based Boosting Cascade Weilong Yang, Wei Song, Zhigang Qiao, Michael Fang 1.
Designing Efficient Cascaded Classifiers: Tradeoff between Accuracy and Cost Vikas Raykar Balaji Krishnapuram Shipeng Yu Siemens Healthcare KDD 2010 TexPoint.
Object Detection Using the Statistics of Parts Presented by Nicholas Chan – Advanced Perception Robust Real-time Object Detection Henry Schneiderman.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Sign Classification Boosted Cascade of Classifiers using University of Southern California Thang Dinh Eunyoung Kim
Benk Erika Kelemen Zsolt
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Face detection Slides adapted Grauman & Liebe’s tutorial
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Robust Real-time Face Detection by Paul Viola and Michael Jones, 2002 Presentation by Kostantina Palla & Alfredo Kalaitzis School of Informatics University.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Automated Solar Cavity Detection
Robust Real Time Face Detection
Adaboost and Object Detection Xu and Arun. Principle of Adaboost Three cobblers with their wits combined equal Zhuge Liang the master mind. Failure is.
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
A Brief Introduction on Face Detection Mei-Chen Yeh 04/06/2010 P. Viola and M. J. Jones, Robust Real-Time Face Detection, IJCV 2004.
Project Overview CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Face detection Many slides adapted from P. Viola.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Munther Abualkibash University of Bridgeport, CT.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Session 7: Face Detection (cont.)
Presented by Minh Hoai Nguyen Date: 28 March 2007
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Logistic Regression & Parallel SGD
Large Scale Support Vector Machines
Lecture 29: Face Detection Revisited
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia

Talk Outline  Paper summary  Problem statement  Overview of cascade classifiers and current approaches for training them  High-level idea of AND-OR framework  Gory math behind AND-OR framework  Experimental results  Discussion

Paper summary Proposes procedure for offline joint learning of cascade classifiers Resulting classifier is tested on polyp detection from computed tomography images Resulting cascade classifier is more accurate than cascade AdaBoost, on par with SVM, and faster than either

Problem statement Polyp detection in a CT image Methodology:  Identify candidate structures (subwindows)‏  Compute features of candidate structures  Classify candidates

Cascade classifiers

A digression to previous work... Why use cascades in the first place? We motivate their use with Paul Viola and Michael Jones' 2004 work on detecting faces. (Also, this is more vision-related)‏

Viola & Jones Face Detection Work Used cascaded classifiers to detect faces To show why cascades are useful, evaluated one big (200 feature) classifier vs feature classifiers – 5000 faces, non-faces – Stage n trained on faces + FP of stage n-1 – Monolithic trained on union of all sets used to train each stage in cascade

Viola & Jones Face Detection Work Monolithic vs. Cascade similar in accuracy Cascade is ~10 times faster – Eliminate FPs early – later stages don't think about them.

Viola & Jones Face Detection Work In a small experiment, even the big classifier works – This paper claims that the big classifier only works w/ ~10,000 (“maybe ~100,000”) negative examples – Cascaded version sifts through hundreds of millions of negatives, since many are pared off at each stage.

Viola & Jones Face Detection Work In a bigger experiment, they show that their 38 classifier cascade approach is 600 times faster than previous work. – Take that, previous work. (Schneiderman & Kanade, 2000)‏ The Point: Cascades eliminate candidates early on, so later stages with higher complexity have to evaluate fewer candidates.

Cascade classifiers Advantages over monolithic classifiers: faster learning and smaller computation time Key insight: a small number of features can reject a big number of false candidates To train C i use set T i of examples that passed previous stages Low false negative rate is critical at all stages

Example: cascade training with AdaBoost Given: set H = {h 1, …, h M } of single-feature classifiers Goal: construct cascade C 1, …, C N s.t. C i is a weighted sum of an m i -subset of H, m i << M Train C i with AdaBoost to select an m i -subset using examples that passed through previous cascade stages. Optimal m i and N are very hard to find. They are set empirically. Each stage is trained to achieve a some false positive rate FP i and/or false negative rate FN i

Drawbacks of modern cascade training Greedy; classifier at stage i is optimal for this stage but not globally. Drawback of AdaBoost cascade training: feature computational complexity is ignored, leading to inefficiency in early stages.

Proposition: AND-OR training Each stage is trained for optimal system, not stage, performance All examples are used for training every stage The stage classifier complexity increases down the cascade Parameters of different stage classifiers may be adjusted throughout the training process, depending on how other classifiers are doing Motivation: a negative example is classified correctly iff it’s rejected by at least one stage (OR); a positive one must pass all stages (AND)‏

Proposition: AND-OR training

Review of Hyperplane Classifiers w/ Hinge Loss Hinge Loss  If hinge loss is 0, correct classification.  If hinge loss > 0, incorrect classification. Goal: Minimize Hinge Loss.  J (α) = Φ(α) + ∑ L i=1 w i (1 − α T y i x i ) + *  Min: J (α) = Φ(α) + ∑ L i=1 w i E  E ≥ (1 − α T y i x i ), E ≥ 0 * there are L pairs {x i,y i } of training data

Sequential Cascaded Classifiers So all that pertains to one classifier on its own. Cascaded Version, w/ k classifiers. (x i gets split into k non-overlapping subvectors)‏ – Subvectors ordered by computational complexity For α k, J(α k ) = Φ(α k ) + ∑ L i=1 w i (1 − α k T y i x ik ) + if i is in T k-1 T is set of “yes”'s from previous classifiers. Cut as many “no”'s as possible, leave the rest for the rest of the α's

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each J(α 1,...,α K ) = ∑ K k=1 Φ k (α) + ∑ i in F (1 − α k T y i x ik ) + + ∑ i in T max (0,(1 − α 1 T y i x i1 ),..., (1 − α k T y i x ik ))‏ k=1 K

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each J(α 1,...,α K ) = ∑ K k=1 Φ k (α) + ∑ i in F (1 − α k T y i x ik ) + + ∑ i in T max (0,(1 − α 1 T y i x i1 ),..., (1 − α k T y i x ik ))‏ k=1 K “AND”“OR”Regularization Terms This is the training cost function.

Optimizing Cascaded Classifiers New minimization (similar to the first one)‏ For each k in K, fix all (α j ) | j ≠ k  Minimize Φ k (α) +( ∑ i in F w i E i ) + ( ∑ i in T E i )‏ E i ≥ 1 − α k T y i x ik, E i ≥ 0 E i ≥ max(0,(1 − α 1 T y i x i1 ),...,(1 − α k-1 T y i x ik-1 ), (1 − α k+1 T y i x ik+1 ),...,(1 − α K T y i x iK ))‏ all but α k are not variable, so this is easy (linear prog./quadratic prog. solver)‏ This subproblem is convex.

Cyclic Optimization Algorithm 0.Initialize all α k to α k 0 using init. training dataset. 1.for all k: fix all α i, i != k, minimize eq. on prev. slide. 2.Compute J(α 1,... α k... α K ) using α k c instead of α k c-1 3.If J c has improved enough, or we've run long enough, stop, otherwise, go back to step 1.

Convergence Analysis They show that they are globally convergent to the set of sub-optimal solutions  Using magic, and a theorem by two people named Fiotot, and Huard. Idea behind proof: Since we fix all vars but one, and minimize, the sequence of solutions is a decreasing sequence, with a lower bound of zero – so we converge to local min. or 0. We threshold and limit iterations, so we could hit a flat spot or just fall short.

Evaluation Application: polyp detection in computed tomography images. Important because polyps are an early stage of cancer Training set: 338 volumes (images), 88 polyps, false positives per volume Test set: 396 volumes, 106 polyps, false positives per volume Goal: reduce false positives per volume (FP/vol) to 0-5

Evaluation (cont’d)‏ 46 features per candidate object Comparison of AdaBoost-trained cascade, single- stage SVM, and AND-OR trained cascade. Features were split into 3 sets, in the increasing order of complexity 3-stage AND-OR classifier was built. AdaBoost classifier was built in 3 phases, where phase n used feature sets 1 through n for training.

Evaluation (cont’d)‏ Sensitivity thresholds for AdaBoost were 0 missed polyps for phase 1, 2 for phase 2, and 1 for phase 3 Number` of stages in each phase was picked to satisfy these thresholds.

Results

Results (cont’d)‏ Cascade AdaBoost CPU secs Cascade AND-OR – 81 CPU secs SVM – 286 CPU secs Cascade AND-OR is as accurate as SVM and 30% better than cascade AdaBoost

Discussion Other methods of solving the optimization How to assign cascade parameters – How are feature sub-vectors sorted – heuristic? What is “ordered by computational complexity”? Speed increase was more prominent result, but hardly mentioned next to discussion of false positives. Generally better to use parallel cascades?