Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
04/03/20082 Semi-Supervised Learning - + Very Few Abundant
04/03/20083 Outline ► Background ► Existing Methods ► Proposed Method Ideal Case General Case ► Experimental Results ► Conclusion
04/03/20084 Overview Semi- Supervised Learning Feature based Graph based Gradually generate class labels Collectively generate class labels Mincut [Blum, ICML01] Gaussian Random Fields [Zhu, ICML03] Local and Global Consistency [Zhou, NIPS04] Generative Model [He, IJCAI07] Self-Training, [Yarowsky, ACL95] Co-Training, [Blum, COLT98] TSVMs [Joachims, ICML99] EM-based, [Nigam, ML00]
04/03/20085 Self-Training [Yarowsky, ACL95] - +
04/03/20086 Co-Training [Blum, COLT98] Sufficient to train a good classifier Conditionally independent given the class
04/03/20087 Transductive SVMs [Joachims, ICML99] - + Inductive SVMs Transductive SVMs Classification Boundary: Away from the Dense Regions!
04/03/20088 EM-based Method [Nigam, ML00] Text Corpus Computer Science Medicine Politics
04/03/ Graph-Based Semi-Supervised Learning
04/03/ Graph-Based Methods ► G={V,E} ► Estimating a function f on the graph f should be close to the given labels on the labeled nodes f should be smooth on the whole graph ► Regularization
04/03/ Graph-Based Methods cont. ► Mincut [Blum, ICML01] ► Gaussian Random Fields [Zhu, ICML03] ► Local and Global Consistency [Zhou, NIPS04] ► Discriminative in Nature!
04/03/ Outline ► Background ► Existing Methods ► Proposed Method Ideal Case General Case ► Experimental Results ► Conclusion
04/03/ Motivation ► Existing Graph-Based Methods: : NO justification Discriminative: inaccurate proportion in the labeled set greatly AFFECTS the performance ► Proposed Method: : WELL justified Generative: estimated class priors COMPENSATES for the inaccurate proportion in the labeled set
04/03/ Notation ► n training examples: ► labeled examples, ► unlabeled examples ► Affinity matrix: ► similarity between and ► Diagonal matrix D : ► ► : set to 1 for labeled examples
04/03/ Ideal Case ► Two classes far apart
04/03/ Derivation Sketch Relate to Relate eigenvector to Relate to
04/03/ Class Conditional Probability ► Theorem 1 As, Similar to kernel density estimation ► Unlabeled data ? ?
04/03/ Class Conditional Probability cont. ► Eigenvectors of S ; ► Element-wise: ► ;
04/03/ Class Conditional Probability cont. ► To get and, iterate: , ► Upon convergence , ► After normalization ,
04/03/ Example of the Ideal Case
04/03/ General Case ► Two classes not far apart ► S not block diagonal Upon Convergence
04/03/ Class Conditional Probability ► Iteration process The labeled examples gradually spread their information to nearby points ► Solution Stop the iteration when certain criterion is satisfied
04/03/ Stopping Criterion ► Average probability of the negative labeled examples in the positive class
04/03/ Stopping Criterion cont. Pre- maturity Excessive Propagation
04/03/ Stopping Criterion cont. ► Average probability of the positive labeled examples in the negative class
04/03/ Example of the General Case
04/03/ Estimating Class Priors ► Theorem 2: in the general case, as ► To get estimates of
04/03/ Prediction ► To classify a new example Calculate the class conditional probabilities According to Bayes rule
04/03/ Outline ► Background ► Existing Methods ► Proposed Method Ideal Case General Case ► Experimental Results ► Conclusion
04/03/ Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Balanced classification 1 vs 2odd vs even Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency
04/03/ Cedar Buffalo Binary Digits Data Set [Hull, PAMI94] ► Unbalanced classification Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency 1 vs 2odd vs even
04/03/ Genre Data Set [Liu, ECML03] ► Classification between random partitions balancedunbalanced Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency
04/03/ Genre Data Set [Liu, ECML03] ► Unbalanced classification newspapers vs otherbiographies vs other Our method Gaussian Random Fields Local and Global Consistency Our method Gaussian Random Fields Local and Global Consistency
04/03/ Conclusion ► A new graph-based semi-supervised learning method Generative in nature Ideal case: theoretical guarantee General case: reasonable estimates Prediction: easy and intuitive
Questions?