Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

Similar presentations


Presentation on theme: "1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination."— Presentation transcript:

1 1 Part II: Practical Implementations.

2 2 Modeling the Classes Stochastic Discrimination

3 3 Algorithm for Training a SD Classifier Generate projectable weak model Evaluate model w.r.t. training set, check enrichment Check uniformity w.r.t. existing collection Add to discriminant

4 4 Dealing with Data Geometry: SD in Practice

5 5 2D Example Adapted from [Kleinberg, PAMI, May 2000]

6 6 An “r=1/2” random subset in the feature space that covers ½ of all the points

7 7 Watch how many such subsets cover a particular point, say, (2,17) (2,17)

8 8 It’s in 1/2 models Y = ½ = 0.5 It’s in 2/3 models Y = 2/3 = 0.67 It’s in 3/4 models Y = ¾ = 0.75 It’s in 4/5 models Y = 4/5 = 0.8 It’s in 5/6 models Y = 5/6 = 0.83 It’s in 0/1 models Y = 0/1 = 0.0 In Out In

9 9 It’s in 6/8 models Y = 6/8 = 0.75 It’s in 7/9 models Y = 7/9 = 0.77 It’s in 8/10 models Y = 8/10 = 0.8 It’s in 8/11 models Y = 8/11 = 0.73 It’s in 8/12 models Y = 8/12 = 0.67 It’s in 5/7 models Y = 5/7 = 0.72 In Out

10 10 Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated

11 11 Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated

12 12 Distribution of model coverage for all points in space, with 100 models

13 13 Distribution of model coverage for all points in space, with 200 models

14 14 Distribution of model coverage for all points in space, with 300 models

15 15 Distribution of model coverage for all points in space, with 400 models

16 16 Distribution of model coverage for all points in space, with 500 models

17 17 Distribution of model coverage for all points in space, with 1000 models

18 18 Distribution of model coverage for all points in space, with 2000 models

19 19 Distribution of model coverage for all points in space, with 5000 models

20 20 Introducing enrichment: For any discrimination to happen, the models must have some difference in coverage for different classes.

21 21 Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another Class distributionA biased (enriched) weak model

22 22 Distribution of model coverage for points in each class, with 100 enriched weak models

23 23 Distribution of model coverage for points in each class, with 200 enriched weak models

24 24 Distribution of model coverage for points in each class, with 300 enriched weak models

25 25 Distribution of model coverage for points in each class, with 400 enriched weak models

26 26 Distribution of model coverage for points in each class, with 500 enriched weak models

27 27 Distribution of model coverage for points in each class, with 1000 enriched weak models

28 28 Distribution of model coverage for points in each class, with 2000 enriched weak models

29 29 Distribution of model coverage for points in each class, with 5000 enriched weak models

30 30 Error rate decreases as number of models increases Decision rule: if Y < 0.5 then class 2 else class 1

31 31 Sparse Training Data: Incomplete knowledge about class distributions Training SetTest Set

32 32 Distribution of model coverage for points in each class, with 100 enriched weak models Training SetTest Set

33 33 Distribution of model coverage for points in each class, with 200 enriched weak models Training SetTest Set

34 34 Distribution of model coverage for points in each class, with 300 enriched weak models Training SetTest Set

35 35 Distribution of model coverage for points in each class, with 400 enriched weak models Training SetTest Set

36 36 Distribution of model coverage for points in each class, with 500 enriched weak models Training SetTest Set

37 37 Distribution of model coverage for points in each class, with 1000 enriched weak models Training SetTest Set

38 38 Distribution of model coverage for points in each class, with 2000 enriched weak models Training SetTest Set

39 39 Distribution of model coverage for points in each class, with 5000 enriched weak models Training SetTest Set No discrimination!

40 40 Models of this type, when enriched for training set, are not necessarily enriched for test set Training SetTest Set Random model with 50% coverage of space

41 41 Introducing projectability: Maintain local continuity of class interpretations. Neighboring points of the same class should share similar model coverage.

42 42 Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood Class distributionA projectable model

43 43 Distribution of model coverage for points in each class, with 100 enriched, projectable weak models Training SetTest Set

44 44 Distribution of model coverage for points in each class, with 300 enriched, projectable weak models Training SetTest Set

45 45 Distribution of model coverage for points in each class, with 400 enriched, projectable weak models Training SetTest Set

46 46 Distribution of model coverage for points in each class, with 500 enriched, projectable weak models Training SetTest Set

47 47 Distribution of model coverage for points in each class, with 1000 enriched, projectable weak models Training SetTest Set

48 48 Distribution of model coverage for points in each class, with 2000 enriched, projectable weak models Training SetTest Set

49 49 Distribution of model coverage for points in each class, with 5000 enriched, projectable weak models Training SetTest Set

50 50 Promoting uniformity: All points in the same class should have equal likelihood to be covered by a model of each particular rating. Retain models that cover the points whose coverage by current collection is less

51 51 Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models Training SetTest Set

52 52 Distribution of model coverage for points in each class, with 1000 enriched, projectable, uniform weak models Training SetTest Set

53 53 Distribution of model coverage for points in each class, with 5000 enriched, projectable, uniform weak models Training SetTest Set

54 54 Distribution of model coverage for points in each class, with 10000 enriched, projectable, uniform weak models Training SetTest Set

55 55 Distribution of model coverage for points in each class, with 50000 enriched, projectable, uniform weak models Training SetTest Set

56 56 The 3 necessary conditions Complementary Information Discriminating Power Generalization Power Enrichment: Projectability: Uniformity:

57 57 Extensions and Comparisons

58 58 Alternative Discriminants [Berlind 1994] Different discriminants for N-class problems Additional condition on symmetry Approximate uniformity Hierarchy of indiscernibility

59 59 Estimates of Classification Accuracies [Chen 1997] Statistical estimate of classification accuracy under weaker conditions: Approximate uniformity Approximate indiscernibility

60 60 For n classes, define n discriminants Y i, one for each class i vs the others Classify an unknown point to the class i for which the computed Y i is the largest Multi-class Problems

61 61 [Ho & Kleinberg ICPR 1996]

62 62

63 63

64 64

65 65 Open Problems Algorithm for uniformity enforcement Deterministic methods? Desirable form of weak models Fewer, more sophisticated classifiers? Other ways to address the 3-way trade-off Enrichment / Uniformity / Projectability

66 66 Random Decision Forest [Ho 1995, 1998] A structured way to create models: fully split a tree, use leaves as models Perfect enrichment and uniformity for TR Promote projectability by subspace projection

67 67 Compact Distribution Maps [Ho & Baird 1993, 1997] Another structured way to create models Start with projectable models by coarse quantization of feature value range Seek enrichment and uniformity Signature of 2 types of events and measurements from a new observation Signal IndexSignal Level

68 68 SD & Other Ensemble Methods Ensemble learning via boosting: A sequential way to promote uniformity of ensemble element coverage XCS (a genetic algorithm) A way to create, filter, and use stochastic models that are regions in feature space

69 69 XCS Classifier System [Wilson,95] Recent focus of GA community Good performance Reinforcement Learning + Genetic Algorithms Model: set of rules Environment Set of Rules input class Reinforcement Learning Genetic Algorithms reward update search if (shape=square and number>10) then class=red if (shape=circle and number<5) then class=yellow

70 70 Multiple Classifier Systems: Examples in Word Image Recognition

71 71 Complementary Strengths of Classifiers The case for classifier combination … decision fusion … mixture of experts … committee decision making Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images

72 72 Classifier Combination Methods Decision Optimization: find consensus among a given set of classifiers Coverage Optimization: create a set of classifiers that work best with a given decision combination function

73 73 Decision Optimization Develop classifiers with expert knowledge Try to make the best use of their decisions via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination … The joint capability of the classifiers set an intrinsic limit on the combined accuracy There is no way to handle the blind spots

74 74 Difficulties in Decision Optimization Reliability versus overall accuracy Fixed or trainable combination function Simple models or combinatorial estimates How to model complementary behavior

75 75 Coverage Optimization Fix a decision combination function Generate classifiers automatically and systematically via training set sub-sampling (stacking, bagging, boosting), subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection … Need enough classifiers to cover all blind spots (how many are enough?) What else is critical?

76 76 Difficulties in Coverage Optimization What kind of differences to introduce: –Subsamples? Subspaces? Super/Subclasses? –Training parameters? –Model geometry? 3-way tradeoff: –discrimination + diversity + generalization Effects of the form of component classifiers

77 77 Dilemmas and Paradoxes in Classifier Combination Weaken individuals for a stronger whole? Sacrifice known samples for unseen cases? Seek agreements or differences?

78 78 Stochastic Discrimination A mathematical theory that relates several key concepts in pattern recognition: –Discriminative power …enrichment –Complementary information …uniformity –Generalization power …projectability It offers a way to describe complementary behavior of classifiers It offers guidelines to design multiple classifier systems (classifier ensembles)


Download ppt "1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination."

Similar presentations


Ads by Google