1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
An Overview of Machine Learning
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Sparse vs. Ensemble Approaches to Supervised Learning
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Ensemble Learning: An Introduction
Induction of Decision Trees
Three kinds of learning
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Machine Learning: Ensemble Methods
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
by B. Zadrozny and C. Elkan
Tin Kam Ho Bell Laboratories Lucent Technologies.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 391L: Machine Learning: Ensembles
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Benk Erika Kelemen Zsolt
Part 2: Change detection for SAR Imagery (based on Chapter 18 of the Book. Change-detection methods for location of mines in SAR imagery, by Dr. Nasser.
Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Classification Heejune Ahn SeoulTech Last updated May. 03.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
1 ICPR 2006 Tin Kam Ho Bell Laboratories Lucent Technologies.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Bell Laboratories Intrinsic complexity of classification problems Tin Kam Ho With contributions from Mitra Basu, Ester Bernado-Mansilla, Richard Baumgartner,
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Image Classification for Automatic Annotation
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
Data Mining and Decision Support
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Machine Learning: Ensemble Methods
Lecture 1.31 Criteria for optimal reception of radio signals.
COMP61011 : Machine Learning Ensemble Models
Machine Learning Basics
Bayesian Averaging of Classifiers and the Overfitting Problem
Ensemble learning.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

1 Part II: Practical Implementations.

2 Modeling the Classes Stochastic Discrimination

3 Algorithm for Training a SD Classifier Generate projectable weak model Evaluate model w.r.t. training set, check enrichment Check uniformity w.r.t. existing collection Add to discriminant

4 Dealing with Data Geometry: SD in Practice

5 2D Example Adapted from [Kleinberg, PAMI, May 2000]

6 An “r=1/2” random subset in the feature space that covers ½ of all the points

7 Watch how many such subsets cover a particular point, say, (2,17) (2,17)

8 It’s in 1/2 models Y = ½ = 0.5 It’s in 2/3 models Y = 2/3 = 0.67 It’s in 3/4 models Y = ¾ = 0.75 It’s in 4/5 models Y = 4/5 = 0.8 It’s in 5/6 models Y = 5/6 = 0.83 It’s in 0/1 models Y = 0/1 = 0.0 In Out In

9 It’s in 6/8 models Y = 6/8 = 0.75 It’s in 7/9 models Y = 7/9 = 0.77 It’s in 8/10 models Y = 8/10 = 0.8 It’s in 8/11 models Y = 8/11 = 0.73 It’s in 8/12 models Y = 8/12 = 0.67 It’s in 5/7 models Y = 5/7 = 0.72 In Out

10 Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated

11 Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated

12 Distribution of model coverage for all points in space, with 100 models

13 Distribution of model coverage for all points in space, with 200 models

14 Distribution of model coverage for all points in space, with 300 models

15 Distribution of model coverage for all points in space, with 400 models

16 Distribution of model coverage for all points in space, with 500 models

17 Distribution of model coverage for all points in space, with 1000 models

18 Distribution of model coverage for all points in space, with 2000 models

19 Distribution of model coverage for all points in space, with 5000 models

20 Introducing enrichment: For any discrimination to happen, the models must have some difference in coverage for different classes.

21 Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another Class distributionA biased (enriched) weak model

22 Distribution of model coverage for points in each class, with 100 enriched weak models

23 Distribution of model coverage for points in each class, with 200 enriched weak models

24 Distribution of model coverage for points in each class, with 300 enriched weak models

25 Distribution of model coverage for points in each class, with 400 enriched weak models

26 Distribution of model coverage for points in each class, with 500 enriched weak models

27 Distribution of model coverage for points in each class, with 1000 enriched weak models

28 Distribution of model coverage for points in each class, with 2000 enriched weak models

29 Distribution of model coverage for points in each class, with 5000 enriched weak models

30 Error rate decreases as number of models increases Decision rule: if Y < 0.5 then class 2 else class 1

31 Sparse Training Data: Incomplete knowledge about class distributions Training SetTest Set

32 Distribution of model coverage for points in each class, with 100 enriched weak models Training SetTest Set

33 Distribution of model coverage for points in each class, with 200 enriched weak models Training SetTest Set

34 Distribution of model coverage for points in each class, with 300 enriched weak models Training SetTest Set

35 Distribution of model coverage for points in each class, with 400 enriched weak models Training SetTest Set

36 Distribution of model coverage for points in each class, with 500 enriched weak models Training SetTest Set

37 Distribution of model coverage for points in each class, with 1000 enriched weak models Training SetTest Set

38 Distribution of model coverage for points in each class, with 2000 enriched weak models Training SetTest Set

39 Distribution of model coverage for points in each class, with 5000 enriched weak models Training SetTest Set No discrimination!

40 Models of this type, when enriched for training set, are not necessarily enriched for test set Training SetTest Set Random model with 50% coverage of space

41 Introducing projectability: Maintain local continuity of class interpretations. Neighboring points of the same class should share similar model coverage.

42 Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood Class distributionA projectable model

43 Distribution of model coverage for points in each class, with 100 enriched, projectable weak models Training SetTest Set

44 Distribution of model coverage for points in each class, with 300 enriched, projectable weak models Training SetTest Set

45 Distribution of model coverage for points in each class, with 400 enriched, projectable weak models Training SetTest Set

46 Distribution of model coverage for points in each class, with 500 enriched, projectable weak models Training SetTest Set

47 Distribution of model coverage for points in each class, with 1000 enriched, projectable weak models Training SetTest Set

48 Distribution of model coverage for points in each class, with 2000 enriched, projectable weak models Training SetTest Set

49 Distribution of model coverage for points in each class, with 5000 enriched, projectable weak models Training SetTest Set

50 Promoting uniformity: All points in the same class should have equal likelihood to be covered by a model of each particular rating. Retain models that cover the points whose coverage by current collection is less

51 Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models Training SetTest Set

52 Distribution of model coverage for points in each class, with 1000 enriched, projectable, uniform weak models Training SetTest Set

53 Distribution of model coverage for points in each class, with 5000 enriched, projectable, uniform weak models Training SetTest Set

54 Distribution of model coverage for points in each class, with enriched, projectable, uniform weak models Training SetTest Set

55 Distribution of model coverage for points in each class, with enriched, projectable, uniform weak models Training SetTest Set

56 The 3 necessary conditions Complementary Information Discriminating Power Generalization Power Enrichment: Projectability: Uniformity:

57 Extensions and Comparisons

58 Alternative Discriminants [Berlind 1994] Different discriminants for N-class problems Additional condition on symmetry Approximate uniformity Hierarchy of indiscernibility

59 Estimates of Classification Accuracies [Chen 1997] Statistical estimate of classification accuracy under weaker conditions: Approximate uniformity Approximate indiscernibility

60 For n classes, define n discriminants Y i, one for each class i vs the others Classify an unknown point to the class i for which the computed Y i is the largest Multi-class Problems

61 [Ho & Kleinberg ICPR 1996]

62

63

64

65 Open Problems Algorithm for uniformity enforcement Deterministic methods? Desirable form of weak models Fewer, more sophisticated classifiers? Other ways to address the 3-way trade-off Enrichment / Uniformity / Projectability

66 Random Decision Forest [Ho 1995, 1998] A structured way to create models: fully split a tree, use leaves as models Perfect enrichment and uniformity for TR Promote projectability by subspace projection

67 Compact Distribution Maps [Ho & Baird 1993, 1997] Another structured way to create models Start with projectable models by coarse quantization of feature value range Seek enrichment and uniformity Signature of 2 types of events and measurements from a new observation Signal IndexSignal Level

68 SD & Other Ensemble Methods Ensemble learning via boosting: A sequential way to promote uniformity of ensemble element coverage XCS (a genetic algorithm) A way to create, filter, and use stochastic models that are regions in feature space

69 XCS Classifier System [Wilson,95] Recent focus of GA community Good performance Reinforcement Learning + Genetic Algorithms Model: set of rules Environment Set of Rules input class Reinforcement Learning Genetic Algorithms reward update search if (shape=square and number>10) then class=red if (shape=circle and number<5) then class=yellow

70 Multiple Classifier Systems: Examples in Word Image Recognition

71 Complementary Strengths of Classifiers The case for classifier combination … decision fusion … mixture of experts … committee decision making Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images

72 Classifier Combination Methods Decision Optimization: find consensus among a given set of classifiers Coverage Optimization: create a set of classifiers that work best with a given decision combination function

73 Decision Optimization Develop classifiers with expert knowledge Try to make the best use of their decisions via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination … The joint capability of the classifiers set an intrinsic limit on the combined accuracy There is no way to handle the blind spots

74 Difficulties in Decision Optimization Reliability versus overall accuracy Fixed or trainable combination function Simple models or combinatorial estimates How to model complementary behavior

75 Coverage Optimization Fix a decision combination function Generate classifiers automatically and systematically via training set sub-sampling (stacking, bagging, boosting), subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection … Need enough classifiers to cover all blind spots (how many are enough?) What else is critical?

76 Difficulties in Coverage Optimization What kind of differences to introduce: –Subsamples? Subspaces? Super/Subclasses? –Training parameters? –Model geometry? 3-way tradeoff: –discrimination + diversity + generalization Effects of the form of component classifiers

77 Dilemmas and Paradoxes in Classifier Combination Weaken individuals for a stronger whole? Sacrifice known samples for unseen cases? Seek agreements or differences?

78 Stochastic Discrimination A mathematical theory that relates several key concepts in pattern recognition: –Discriminative power …enrichment –Complementary information …uniformity –Generalization power …projectability It offers a way to describe complementary behavior of classifiers It offers guidelines to design multiple classifier systems (classifier ensembles)