Download presentation
Presentation is loading. Please wait.
Published byAnnice Henderson Modified over 8 years ago
1
1 Part II: Practical Implementations.
2
2 Modeling the Classes Stochastic Discrimination
3
3 Algorithm for Training a SD Classifier Generate projectable weak model Evaluate model w.r.t. training set, check enrichment Check uniformity w.r.t. existing collection Add to discriminant
4
4 Dealing with Data Geometry: SD in Practice
5
5 2D Example Adapted from [Kleinberg, PAMI, May 2000]
6
6 An “r=1/2” random subset in the feature space that covers ½ of all the points
7
7 Watch how many such subsets cover a particular point, say, (2,17) (2,17)
8
8 It’s in 1/2 models Y = ½ = 0.5 It’s in 2/3 models Y = 2/3 = 0.67 It’s in 3/4 models Y = ¾ = 0.75 It’s in 4/5 models Y = 4/5 = 0.8 It’s in 5/6 models Y = 5/6 = 0.83 It’s in 0/1 models Y = 0/1 = 0.0 In Out In
9
9 It’s in 6/8 models Y = 6/8 = 0.75 It’s in 7/9 models Y = 7/9 = 0.77 It’s in 8/10 models Y = 8/10 = 0.8 It’s in 8/11 models Y = 8/11 = 0.73 It’s in 8/12 models Y = 8/12 = 0.67 It’s in 5/7 models Y = 5/7 = 0.72 In Out
10
10 Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated
11
11 Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated
12
12 Distribution of model coverage for all points in space, with 100 models
13
13 Distribution of model coverage for all points in space, with 200 models
14
14 Distribution of model coverage for all points in space, with 300 models
15
15 Distribution of model coverage for all points in space, with 400 models
16
16 Distribution of model coverage for all points in space, with 500 models
17
17 Distribution of model coverage for all points in space, with 1000 models
18
18 Distribution of model coverage for all points in space, with 2000 models
19
19 Distribution of model coverage for all points in space, with 5000 models
20
20 Introducing enrichment: For any discrimination to happen, the models must have some difference in coverage for different classes.
21
21 Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another Class distributionA biased (enriched) weak model
22
22 Distribution of model coverage for points in each class, with 100 enriched weak models
23
23 Distribution of model coverage for points in each class, with 200 enriched weak models
24
24 Distribution of model coverage for points in each class, with 300 enriched weak models
25
25 Distribution of model coverage for points in each class, with 400 enriched weak models
26
26 Distribution of model coverage for points in each class, with 500 enriched weak models
27
27 Distribution of model coverage for points in each class, with 1000 enriched weak models
28
28 Distribution of model coverage for points in each class, with 2000 enriched weak models
29
29 Distribution of model coverage for points in each class, with 5000 enriched weak models
30
30 Error rate decreases as number of models increases Decision rule: if Y < 0.5 then class 2 else class 1
31
31 Sparse Training Data: Incomplete knowledge about class distributions Training SetTest Set
32
32 Distribution of model coverage for points in each class, with 100 enriched weak models Training SetTest Set
33
33 Distribution of model coverage for points in each class, with 200 enriched weak models Training SetTest Set
34
34 Distribution of model coverage for points in each class, with 300 enriched weak models Training SetTest Set
35
35 Distribution of model coverage for points in each class, with 400 enriched weak models Training SetTest Set
36
36 Distribution of model coverage for points in each class, with 500 enriched weak models Training SetTest Set
37
37 Distribution of model coverage for points in each class, with 1000 enriched weak models Training SetTest Set
38
38 Distribution of model coverage for points in each class, with 2000 enriched weak models Training SetTest Set
39
39 Distribution of model coverage for points in each class, with 5000 enriched weak models Training SetTest Set No discrimination!
40
40 Models of this type, when enriched for training set, are not necessarily enriched for test set Training SetTest Set Random model with 50% coverage of space
41
41 Introducing projectability: Maintain local continuity of class interpretations. Neighboring points of the same class should share similar model coverage.
42
42 Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood Class distributionA projectable model
43
43 Distribution of model coverage for points in each class, with 100 enriched, projectable weak models Training SetTest Set
44
44 Distribution of model coverage for points in each class, with 300 enriched, projectable weak models Training SetTest Set
45
45 Distribution of model coverage for points in each class, with 400 enriched, projectable weak models Training SetTest Set
46
46 Distribution of model coverage for points in each class, with 500 enriched, projectable weak models Training SetTest Set
47
47 Distribution of model coverage for points in each class, with 1000 enriched, projectable weak models Training SetTest Set
48
48 Distribution of model coverage for points in each class, with 2000 enriched, projectable weak models Training SetTest Set
49
49 Distribution of model coverage for points in each class, with 5000 enriched, projectable weak models Training SetTest Set
50
50 Promoting uniformity: All points in the same class should have equal likelihood to be covered by a model of each particular rating. Retain models that cover the points whose coverage by current collection is less
51
51 Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models Training SetTest Set
52
52 Distribution of model coverage for points in each class, with 1000 enriched, projectable, uniform weak models Training SetTest Set
53
53 Distribution of model coverage for points in each class, with 5000 enriched, projectable, uniform weak models Training SetTest Set
54
54 Distribution of model coverage for points in each class, with 10000 enriched, projectable, uniform weak models Training SetTest Set
55
55 Distribution of model coverage for points in each class, with 50000 enriched, projectable, uniform weak models Training SetTest Set
56
56 The 3 necessary conditions Complementary Information Discriminating Power Generalization Power Enrichment: Projectability: Uniformity:
57
57 Extensions and Comparisons
58
58 Alternative Discriminants [Berlind 1994] Different discriminants for N-class problems Additional condition on symmetry Approximate uniformity Hierarchy of indiscernibility
59
59 Estimates of Classification Accuracies [Chen 1997] Statistical estimate of classification accuracy under weaker conditions: Approximate uniformity Approximate indiscernibility
60
60 For n classes, define n discriminants Y i, one for each class i vs the others Classify an unknown point to the class i for which the computed Y i is the largest Multi-class Problems
61
61 [Ho & Kleinberg ICPR 1996]
62
62
63
63
64
64
65
65 Open Problems Algorithm for uniformity enforcement Deterministic methods? Desirable form of weak models Fewer, more sophisticated classifiers? Other ways to address the 3-way trade-off Enrichment / Uniformity / Projectability
66
66 Random Decision Forest [Ho 1995, 1998] A structured way to create models: fully split a tree, use leaves as models Perfect enrichment and uniformity for TR Promote projectability by subspace projection
67
67 Compact Distribution Maps [Ho & Baird 1993, 1997] Another structured way to create models Start with projectable models by coarse quantization of feature value range Seek enrichment and uniformity Signature of 2 types of events and measurements from a new observation Signal IndexSignal Level
68
68 SD & Other Ensemble Methods Ensemble learning via boosting: A sequential way to promote uniformity of ensemble element coverage XCS (a genetic algorithm) A way to create, filter, and use stochastic models that are regions in feature space
69
69 XCS Classifier System [Wilson,95] Recent focus of GA community Good performance Reinforcement Learning + Genetic Algorithms Model: set of rules Environment Set of Rules input class Reinforcement Learning Genetic Algorithms reward update search if (shape=square and number>10) then class=red if (shape=circle and number<5) then class=yellow
70
70 Multiple Classifier Systems: Examples in Word Image Recognition
71
71 Complementary Strengths of Classifiers The case for classifier combination … decision fusion … mixture of experts … committee decision making Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images
72
72 Classifier Combination Methods Decision Optimization: find consensus among a given set of classifiers Coverage Optimization: create a set of classifiers that work best with a given decision combination function
73
73 Decision Optimization Develop classifiers with expert knowledge Try to make the best use of their decisions via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination … The joint capability of the classifiers set an intrinsic limit on the combined accuracy There is no way to handle the blind spots
74
74 Difficulties in Decision Optimization Reliability versus overall accuracy Fixed or trainable combination function Simple models or combinatorial estimates How to model complementary behavior
75
75 Coverage Optimization Fix a decision combination function Generate classifiers automatically and systematically via training set sub-sampling (stacking, bagging, boosting), subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection … Need enough classifiers to cover all blind spots (how many are enough?) What else is critical?
76
76 Difficulties in Coverage Optimization What kind of differences to introduce: –Subsamples? Subspaces? Super/Subclasses? –Training parameters? –Model geometry? 3-way tradeoff: –discrimination + diversity + generalization Effects of the form of component classifiers
77
77 Dilemmas and Paradoxes in Classifier Combination Weaken individuals for a stronger whole? Sacrifice known samples for unseen cases? Seek agreements or differences?
78
78 Stochastic Discrimination A mathematical theory that relates several key concepts in pattern recognition: –Discriminative power …enrichment –Complementary information …uniformity –Generalization power …projectability It offers a way to describe complementary behavior of classifiers It offers guidelines to design multiple classifier systems (classifier ensembles)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.