Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008.

Similar presentations


Presentation on theme: "Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008."— Presentation transcript:

1 Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008

2 04/03/20082 Semi-Supervised Learning - + Very Few Abundant

3 04/03/20083 Outline ► Background ► Existing Methods ► Proposed Method  Ideal Case  General Case ► Experimental Results ► Conclusion

4 04/03/20084 Overview Semi- Supervised Learning Feature based Graph based Gradually generate class labels Collectively generate class labels Mincut [Blum, ICML01] Gaussian Random Fields [Zhu, ICML03] Local and Global Consistency [Zhou, NIPS04] Generative Model [He, IJCAI07] Self-Training, [Yarowsky, ACL95] Co-Training, [Blum, COLT98] TSVMs [Joachims, ICML99] EM-based, [Nigam, ML00]

5 04/03/20085 Self-Training [Yarowsky, ACL95] - +

6 04/03/20086 Co-Training [Blum, COLT98] Sufficient to train a good classifier Conditionally independent given the class

7 04/03/20087 Transductive SVMs [Joachims, ICML99] - + Inductive SVMs Transductive SVMs Classification Boundary: Away from the Dense Regions!

8 04/03/20088 EM-based Method [Nigam, ML00] Text Corpus Computer Science Medicine Politics

9 04/03/20089 - + - - --- ++++ Graph-Based Semi-Supervised Learning - - + + + +

10 04/03/200810 Graph-Based Methods ► G={V,E} ► Estimating a function f on the graph  f should be close to the given labels on the labeled nodes  f should be smooth on the whole graph ► Regularization

11 04/03/200811 Graph-Based Methods cont. ► Mincut [Blum, ICML01]  ► Gaussian Random Fields [Zhu, ICML03]  ► Local and Global Consistency [Zhou, NIPS04]  ► Discriminative in Nature!

12 04/03/200812 Outline ► Background ► Existing Methods ► Proposed Method  Ideal Case  General Case ► Experimental Results ► Conclusion

13 04/03/200813 Motivation ► Existing Graph-Based Methods:  : NO justification  Discriminative: inaccurate proportion in the labeled set greatly AFFECTS the performance ► Proposed Method:  : WELL justified  Generative: estimated class priors COMPENSATES for the inaccurate proportion in the labeled set

14 04/03/200814 Notation ► n training examples: ► labeled examples, ► unlabeled examples ► Affinity matrix: ► similarity between and ► Diagonal matrix D : ► ► : set to 1 for labeled examples

15 04/03/200815 Ideal Case ► Two classes far apart

16 04/03/200816 Derivation Sketch Relate to Relate eigenvector to Relate to

17 04/03/200817 Class Conditional Probability ► Theorem 1  As,  Similar to kernel density estimation ► Unlabeled data  ? ?

18 04/03/200818 Class Conditional Probability cont. ► Eigenvectors of S  ; ► Element-wise: ► ;

19 04/03/200819 Class Conditional Probability cont. ► To get and, iterate: , ► Upon convergence , ► After normalization ,

20 04/03/200820 Example of the Ideal Case

21 04/03/200821 General Case ► Two classes not far apart ► S not block diagonal Upon Convergence

22 04/03/200822 Class Conditional Probability ► Iteration process  The labeled examples gradually spread their information to nearby points ► Solution  Stop the iteration when certain criterion is satisfied

23 04/03/200823 Stopping Criterion ► Average probability of the negative labeled examples in the positive class

24 04/03/200824 Stopping Criterion cont. Pre- maturity Excessive Propagation

25 04/03/200825 Stopping Criterion cont. ► Average probability of the positive labeled examples in the negative class

26 04/03/200826 Example of the General Case

27 04/03/200827 Estimating Class Priors ► Theorem 2: in the general case, as  ► To get estimates of  

28 04/03/200828 Prediction ► To classify a new example  Calculate the class conditional probabilities  According to Bayes rule

29 04/03/200829 Outline ► Background ► Existing Methods ► Proposed Method  Ideal Case  General Case ► Experimental Results ► Conclusion

30 04/03/200830 Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Balanced classification 1 vs 2odd vs even Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

31 04/03/200831 Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Unbalanced classification Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency 1 vs 2odd vs even

32 04/03/200832 Genre Data Set [Liu, ECML03] ► Classification between random partitions balancedunbalanced Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

33 04/03/200833 Genre Data Set [Liu, ECML03] ► Unbalanced classification newspapers vs otherbiographies vs other Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency

34 04/03/200834 Conclusion ► A new graph-based semi-supervised learning method  Generative in nature  Ideal case: theoretical guarantee  General case: reasonable estimates  Prediction: easy and intuitive

35 Questions?


Download ppt "Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department 04-10-2008."

Similar presentations


Ads by Google