1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

1 Part II: Practical Implementations.

2 Modeling the Classes Stochastic Discrimination

3 Algorithm for Training a SD Classifier Generate projectable weak model Evaluate model w.r.t. training set, check enrichment Check uniformity w.r.t. existing collection Add to discriminant

4 Dealing with Data Geometry: SD in Practice

5 2D Example Adapted from [Kleinberg, PAMI, May 2000]

6 An “r=1/2” random subset in the feature space that covers ½ of all the points

7 Watch how many such subsets cover a particular point, say, (2,17) (2,17)

8 It’s in 1/2 models Y = ½ = 0.5 It’s in 2/3 models Y = 2/3 = 0.67 It’s in 3/4 models Y = ¾ = 0.75 It’s in 4/5 models Y = 4/5 = 0.8 It’s in 5/6 models Y = 5/6 = 0.83 It’s in 0/1 models Y = 0/1 = 0.0 In Out In

9 It’s in 6/8 models Y = 6/8 = 0.75 It’s in 7/9 models Y = 7/9 = 0.77 It’s in 8/10 models Y = 8/10 = 0.8 It’s in 8/11 models Y = 8/11 = 0.73 It’s in 8/12 models Y = 8/12 = 0.67 It’s in 5/7 models Y = 5/7 = 0.72 In Out

10 Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated

11 Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated

12 Distribution of model coverage for all points in space, with 100 models

20 Introducing enrichment: For any discrimination to happen, the models must have some difference in coverage for different classes.

21 Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another Class distributionA biased (enriched) weak model

22 Distribution of model coverage for points in each class, with 100 enriched weak models

30 Error rate decreases as number of models increases Decision rule: if Y < 0.5 then class 2 else class 1

31 Sparse Training Data: Incomplete knowledge about class distributions Training SetTest Set

32 Distribution of model coverage for points in each class, with 100 enriched weak models Training SetTest Set

39 Distribution of model coverage for points in each class, with 5000 enriched weak models Training SetTest Set No discrimination!

40 Models of this type, when enriched for training set, are not necessarily enriched for test set Training SetTest Set Random model with 50% coverage of space

41 Introducing projectability: Maintain local continuity of class interpretations. Neighboring points of the same class should share similar model coverage.

42 Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood Class distributionA projectable model

43 Distribution of model coverage for points in each class, with 100 enriched, projectable weak models Training SetTest Set

50 Promoting uniformity: All points in the same class should have equal likelihood to be covered by a model of each particular rating. Retain models that cover the points whose coverage by current collection is less

51 Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models Training SetTest Set

56 The 3 necessary conditions Complementary Information Discriminating Power Generalization Power Enrichment: Projectability: Uniformity:

57 Extensions and Comparisons

58 Alternative Discriminants [Berlind 1994] Different discriminants for N-class problems Additional condition on symmetry Approximate uniformity Hierarchy of indiscernibility

59 Estimates of Classification Accuracies [Chen 1997] Statistical estimate of classification accuracy under weaker conditions: Approximate uniformity Approximate indiscernibility

60 For n classes, define n discriminants Y i, one for each class i vs the others Classify an unknown point to the class i for which the computed Y i is the largest Multi-class Problems

61 [Ho & Kleinberg ICPR 1996]

65 Open Problems Algorithm for uniformity enforcement Deterministic methods? Desirable form of weak models Fewer, more sophisticated classifiers? Other ways to address the 3-way trade-off Enrichment / Uniformity / Projectability

66 Random Decision Forest [Ho 1995, 1998] A structured way to create models: fully split a tree, use leaves as models Perfect enrichment and uniformity for TR Promote projectability by subspace projection

67 Compact Distribution Maps [Ho & Baird 1993, 1997] Another structured way to create models Start with projectable models by coarse quantization of feature value range Seek enrichment and uniformity Signature of 2 types of events and measurements from a new observation Signal IndexSignal Level

68 SD & Other Ensemble Methods Ensemble learning via boosting: A sequential way to promote uniformity of ensemble element coverage XCS (a genetic algorithm) A way to create, filter, and use stochastic models that are regions in feature space

69 XCS Classifier System [Wilson,95] Recent focus of GA community Good performance Reinforcement Learning + Genetic Algorithms Model: set of rules Environment Set of Rules input class Reinforcement Learning Genetic Algorithms reward update search if (shape=square and number>10) then class=red if (shape=circle and number<5) then class=yellow

70 Multiple Classifier Systems: Examples in Word Image Recognition

71 Complementary Strengths of Classifiers The case for classifier combination … decision fusion … mixture of experts … committee decision making Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images

72 Classifier Combination Methods Decision Optimization: find consensus among a given set of classifiers Coverage Optimization: create a set of classifiers that work best with a given decision combination function

73 Decision Optimization Develop classifiers with expert knowledge Try to make the best use of their decisions via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination … The joint capability of the classifiers set an intrinsic limit on the combined accuracy There is no way to handle the blind spots

74 Difficulties in Decision Optimization Reliability versus overall accuracy Fixed or trainable combination function Simple models or combinatorial estimates How to model complementary behavior

75 Coverage Optimization Fix a decision combination function Generate classifiers automatically and systematically via training set sub-sampling (stacking, bagging, boosting), subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection … Need enough classifiers to cover all blind spots (how many are enough?) What else is critical?

76 Difficulties in Coverage Optimization What kind of differences to introduce: –Subsamples? Subspaces? Super/Subclasses? –Training parameters? –Model geometry? 3-way tradeoff: –discrimination + diversity + generalization Effects of the form of component classifiers

77 Dilemmas and Paradoxes in Classifier Combination Weaken individuals for a stronger whole? Sacrifice known samples for unseen cases? Seek agreements or differences?

78 Stochastic Discrimination A mathematical theory that relates several key concepts in pattern recognition: –Discriminative power …enrichment –Complementary information …uniformity –Generalization power …projectability It offers a way to describe complementary behavior of classifiers It offers guidelines to design multiple classifier systems (classifier ensembles)

1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

Similar presentations

Presentation on theme: "1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

Similar presentations

Presentation on theme: "1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination."— Presentation transcript:

Similar presentations

About project

Feedback