Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM T. J. Watson Research Center KDD’09 Paris, France

2 2/24 Information Explosion Fan Site Descriptions Pictures Videos Not only at scale, but also at available sources! Blogs descriptions reviews

3 3/24 Multiple Source Classification Image CategorizationLike? Dislike?Research Area images, descriptions, notes, comments, albums, tags……. movie genres, cast, director, plots……. users viewing history, movie ratings… publication and co- authorship network, published papers, …….

4 4/24 Model Combination helps! Some areas share similar keywords People may publish in relevant but different areas There may be cross- discipline co-operations supervised unsupervised Supervised or unsupervised

5 5/24 Motivation Multiple sources provide complementary information –We may want to use all of them to derive better classification solution Concatenation of information sources is impossible –Information sources have different formats –May only have access to classification or clustering results due to privacy issues Ensemble of supervised and unsupervised models –Combine their outputs on the same set of objects –Derive a consolidated solution –Reduce errors made by individual models –More robust and stable

6 6/24 Consensus Learning

7 7/24 Related Work Ensemble of Classification Models –Bagging, boosting, …… –Focus on how to construct and combine weak classifiers Ensemble of Clustering Models –Derive a consolidated clustering solution Semi-supervised (transductive) learning Link-based classification –Use link or manifold structure to help classification –One unlabeled source Multi-view learning –Construct a classifier from multiple sources

8 8/24 Problem Formulation Principles –Consensus: maximize agreement among supervised and unsupervised models –Constraints: Label predictions should be close to the outputs of the supervised models Objective function ConsensusConstraints NP-hard!

9 9/24 Methodology Step 1: Group-level predictions Step 2: Combine multiple models using local weights How to propagate and negotiate? How to compute local model weights?

10 10/24 Group-level Predictions (1) Groups: –similarity: percentage of common members –initial labeling: category information from supervised models

11 11/24 Group-level Predictions (2) Principles –Conditional probability estimates smooth over the graph –Not deviate too much from the initial labeling [0.16 0.16 0.98] [0.93 0.07 0] Labeled nodes Unlabeled nodes

12 12/24 Local Weighting Scheme (1) Principles –If M makes more accurate prediction on x, M’s weight on x should be higher Difficulties –“unsupervised” model combination—cannot use cross-validation

13 13/24 Local Weighting Scheme (2) Method –Consensus To compute M i ’s weight on x, use M 1,…, M i-1, M i+1, …,M r as the true model, and compute the average accuracy Use consistency in x’s neighbors’ label predictions between two models to approximate accuracy –Random Assign equal weights to all the models consensusrandom

14 14/24 Algorithm and Time Complexity Compute similarity and local consistency for each pairs of groups for each group iterate f steps Compute probability estimates based on the weighted average of neighbors Compute local weights for each example for each model Combine models’ predictions using local weights O(s 2 ) O(fcs 2 ) O(rn) linear in the number of examples!

15 15/24 Experiments-Data Sets 20 Newsgroup –newsgroup messages categorization –only text information available Cora –research paper area categorization –paper abstracts and citation information available DBLP –researchers area prediction –publication and co-authorship network, and publication content –conferences’ areas are known Yahoo! Movie –user viewing interest analysis (favored movie types) –movie ratings and synopses –movie genres are known

16 16/24 Experiments-Baseline Methods Single models –20 Newsgroup: logistic regression, SVM, K-means, min-cut –Cora abstracts, citations (with or without a labeled set) –DBLP publication titles, links (with or without labels from conferences) –Yahoo! Movies Movie ratings and synopses (with or without labels from movies) Ensemble approaches –majority-voting classification ensemble –majority-voting clustering ensemble –clustering ensemble on all of the four models

17 17/24 Experiments-Evaluation Measures Classification Accuracy –Clustering algorithms: map each cluster to the best possible class label (should get the best accuracy the algorithm can achieve) Clustering quality –Normalized mutual information –Get a “true” model from the groudtruth labels –Compute the shared information between the “true” model and each algorithm

18 18/24 Empirical Results -Accuracy

19 19/24 Empirical Results-NMI

20 20/24 Empirical Results- DBLP data

21 21/24 Empirical Results-Yahoo! Movies

22 22/24 Empirical Results-Scalability

23 23/24 Conclusions Summary –We propose to integrate multiple information sources for better classification –We study the problem of consolidating outputs from multiple supervised and unsupervised models –The proposed two-step algorithm solve the problem by propagating and negotiating among multiple models –The algorithm runs in linear time. –Results on various data sets show the improvements Follow-up Work –Algorithm and theory –Applications

24 24/24 Thanks! Any questions? http://www.ews.uiuc.edu/~jinggao3/kdd09clsu.htm jinggao3@illinois.edu Office: 2119B


Download ppt "Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google