“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia

Talk Outline  Paper summary  Problem statement  Overview of cascade classifiers and current approaches for training them  High-level idea of AND-OR framework  Gory math behind AND-OR framework  Experimental results  Discussion

Paper summary Proposes procedure for offline joint learning of cascade classifiers Resulting classifier is tested on polyp detection from computed tomography images Resulting cascade classifier is more accurate than cascade AdaBoost, on par with SVM, and faster than either

Problem statement Polyp detection in a CT image Methodology:  Identify candidate structures (subwindows)‏  Compute features of candidate structures  Classify candidates

Cascade classifiers

A digression to previous work... Why use cascades in the first place? We motivate their use with Paul Viola and Michael Jones' 2004 work on detecting faces. (Also, this is more vision-related)‏

Viola & Jones Face Detection Work Used cascaded classifiers to detect faces To show why cascades are useful, evaluated one big (200 feature) classifier vs. 10 20- feature classifiers – 5000 faces, 10000 non-faces – Stage n trained on faces + FP of stage n-1 – Monolithic trained on union of all sets used to train each stage in cascade

Viola & Jones Face Detection Work Monolithic vs. Cascade similar in accuracy Cascade is ~10 times faster – Eliminate FPs early – later stages don't think about them.

Viola & Jones Face Detection Work In a small experiment, even the big classifier works – This paper claims that the big classifier only works w/ ~10,000 (“maybe ~100,000”) negative examples – Cascaded version sifts through hundreds of millions of negatives, since many are pared off at each stage.

Viola & Jones Face Detection Work In a bigger experiment, they show that their 38 classifier cascade approach is 600 times faster than previous work. – Take that, previous work. (Schneiderman & Kanade, 2000)‏ The Point: Cascades eliminate candidates early on, so later stages with higher complexity have to evaluate fewer candidates.

Cascade classifiers Advantages over monolithic classifiers: faster learning and smaller computation time Key insight: a small number of features can reject a big number of false candidates To train C i use set T i of examples that passed previous stages Low false negative rate is critical at all stages

Example: cascade training with AdaBoost Given: set H = {h 1, …, h M } of single-feature classifiers Goal: construct cascade C 1, …, C N s.t. C i is a weighted sum of an m i -subset of H, m i << M Train C i with AdaBoost to select an m i -subset using examples that passed through previous cascade stages. Optimal m i and N are very hard to find. They are set empirically. Each stage is trained to achieve a some false positive rate FP i and/or false negative rate FN i

Drawbacks of modern cascade training Greedy; classifier at stage i is optimal for this stage but not globally. Drawback of AdaBoost cascade training: feature computational complexity is ignored, leading to inefficiency in early stages.

Proposition: AND-OR training Each stage is trained for optimal system, not stage, performance All examples are used for training every stage The stage classifier complexity increases down the cascade Parameters of different stage classifiers may be adjusted throughout the training process, depending on how other classifiers are doing Motivation: a negative example is classified correctly iff it’s rejected by at least one stage (OR); a positive one must pass all stages (AND)‏

Proposition: AND-OR training

Review of Hyperplane Classifiers w/ Hinge Loss Hinge Loss  If hinge loss is 0, correct classification.  If hinge loss > 0, incorrect classification. Goal: Minimize Hinge Loss.  J (α) = Φ(α) + ∑ L i=1 w i (1 − α T y i x i ) + *  Min: J (α) = Φ(α) + ∑ L i=1 w i E  E ≥ (1 − α T y i x i ), E ≥ 0 * there are L pairs {x i,y i } of training data

Sequential Cascaded Classifiers So all that pertains to one classifier on its own. Cascaded Version, w/ k classifiers. (x i gets split into k non-overlapping subvectors)‏ – Subvectors ordered by computational complexity For α k, J(α k ) = Φ(α k ) + ∑ L i=1 w i (1 − α k T y i x ik ) + if i is in T k-1 T is set of “yes”'s from previous classifiers. Cut as many “no”'s as possible, leave the rest for the rest of the α's

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each J(α 1,...,α K ) = ∑ K k=1 Φ k (α) + ∑ i in F (1 − α k T y i x ik ) + + ∑ i in T max (0,(1 − α 1 T y i x i1 ),..., (1 − α k T y i x ik ))‏ k=1 K

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each J(α 1,...,α K ) = ∑ K k=1 Φ k (α) + ∑ i in F (1 − α k T y i x ik ) + + ∑ i in T max (0,(1 − α 1 T y i x i1 ),..., (1 − α k T y i x ik ))‏ k=1 K “AND”“OR”Regularization Terms This is the training cost function.

Optimizing Cascaded Classifiers New minimization (similar to the first one)‏ For each k in K, fix all (α j ) | j ≠ k  Minimize Φ k (α) +( ∑ i in F w i E i ) + ( ∑ i in T E i )‏ E i ≥ 1 − α k T y i x ik, E i ≥ 0 E i ≥ max(0,(1 − α 1 T y i x i1 ),...,(1 − α k-1 T y i x ik-1 ), (1 − α k+1 T y i x ik+1 ),...,(1 − α K T y i x iK ))‏ all but α k are not variable, so this is easy (linear prog./quadratic prog. solver)‏ This subproblem is convex.

Cyclic Optimization Algorithm 0.Initialize all α k to α k 0 using init. training dataset. 1.for all k: fix all α i, i != k, minimize eq. on prev. slide. 2.Compute J(α 1,... α k... α K ) using α k c instead of α k c-1 3.If J c has improved enough, or we've run long enough, stop, otherwise, go back to step 1.

Convergence Analysis They show that they are globally convergent to the set of sub-optimal solutions  Using magic, and a theorem by two people named Fiotot, and Huard. Idea behind proof: Since we fix all vars but one, and minimize, the sequence of solutions is a decreasing sequence, with a lower bound of zero – so we converge to local min. or 0. We threshold and limit iterations, so we could hit a flat spot or just fall short.

Evaluation Application: polyp detection in computed tomography images. Important because polyps are an early stage of cancer Training set: 338 volumes (images), 88 polyps, 137.3 false positives per volume Test set: 396 volumes, 106 polyps, 139.4 false positives per volume Goal: reduce false positives per volume (FP/vol) to 0-5

Evaluation (cont’d)‏ 46 features per candidate object Comparison of AdaBoost-trained cascade, single- stage SVM, and AND-OR trained cascade. Features were split into 3 sets, in the increasing order of complexity 3-stage AND-OR classifier was built. AdaBoost classifier was built in 3 phases, where phase n used feature sets 1 through n for training.

Evaluation (cont’d)‏ Sensitivity thresholds for AdaBoost were 0 missed polyps for phase 1, 2 for phase 2, and 1 for phase 3 Number` of stages in each phase was picked to satisfy these thresholds.

Results

Results (cont’d)‏ Cascade AdaBoost - 118 CPU secs Cascade AND-OR – 81 CPU secs SVM – 286 CPU secs Cascade AND-OR is as accurate as SVM and 30% better than cascade AdaBoost

Discussion Other methods of solving the optimization How to assign cascade parameters – How are feature sub-vectors sorted – heuristic? What is “ordered by computational complexity”? Speed increase was more prominent result, but hardly mentioned next to discussion of false positives. Generally better to use parallel cascades?

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Similar presentations

Presentation on theme: "“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Similar presentations

Presentation on theme: "“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia."— Presentation transcript:

Similar presentations

About project

Feedback