Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

Slides:



Advertisements
Similar presentations
SI/EECS 767 Yang Liu Apr 2,  A minimum cut is the smallest cut that will disconnect a graph into two disjoint subsets.  Application:  Graph partitioning.
Advertisements

Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Machine learning continued Image source:
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
A Probabilistic Framework for Semi-Supervised Clustering
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
ALADDIN Workshop on Graph Partitioning in Vision and Machine Learning Jan 9-11, 2003 Welcome! [Organizers: Avrim Blum, Jon Kleinberg, John Lafferty, Jianbo.
Learning using Graph Mincuts Shuchi Chawla Carnegie Mellon University 1/11/2003.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Lecture 21: Spectral Clustering
Locally Constraint Support Vector Clustering
Techniques For Exploiting Unlabeled Data Mugizi Rwebangira Thesis Proposal May 11,2007 Committee: Avrim Blum, CMU (Co-Chair) John Lafferty, CMU (Co-Chair)
Sparse vs. Ensemble Approaches to Supervised Learning
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira Carnegie Mellon.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
HCS Clustering Algorithm
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Graph-based Iterative Hybrid Feature Selection Erheng Zhong † Sihong Xie † Wei Fan ‡ Jiangtao Ren † Jing Peng # Kun Zhang $ † Sun Yat-sen University ‡
MRF Labeling With Graph Cut CMPUT 615 Nilanjan Ray.
1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.
Feature Subset Selection using Minimum Cost Spanning Trees Mike Farah Supervisor: Dr. Sid Ray.
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Sparse vs. Ensemble Approaches to Supervised Learning
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
Techniques For Exploiting Unlabeled Data Mugizi Rwebangira Thesis Defense September 8,2008 Committee: Avrim Blum, CMU (Co-Chair) John Lafferty, CMU (Co-Chair)
Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系 郭煌政 2004/10/20.
Active Learning on Spatial Data Christine Körner Fraunhofer AIS, Uni Bonn.
Modern Topics in Multivariate Methods for Data Analysis.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Defect Prediction Techniques He Qing What is Defect Prediction? Use historical data to predict defect. To plan for allocation of defect detection.
Semi-Supervised Clustering
Linli Xu Martha White Dale Schuurmans University of Alberta
Author: Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng
Approximating the MST Weight in Sublinear Time
Combining Labeled and Unlabeled Data with Co-Training
Towards Globally Optimal Normal Orientations for Large Point Clouds
Research in Computational Molecular Biology , Vol (2008)
Importance Weighted Active Learning
K Nearest Neighbor Classification
Data Mining Practical Machine Learning Tools and Techniques
Learning with information of features
Sublinear Algorihms for Big Data
Concave Minimization for Support Vector Machine Classifiers
Using Manifold Structure for Partially Labeled Classification
Semi-Supervised Learning
Presentation transcript:

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira

Outline Often have little labeled data but lots of unlabeled data. We want to use the relationships between the unlabeled examples to guide our predictions. Idea: “Similar examples should generally be labeled similarly."

Learning using Graph Mincuts: Blum and Chawla (ICML 2001)

Construct a Graph

Add sink and source -+

Obtain s-t mincut Mincut -+

Classification +- Mincut

Confidence on the predictions? Plain mincut gives no indication of the examples that it is most confident about. Solution Add random noise to the edges. Run mincut several times. For each unlabeled example take majority vote.

Motivation Margin of the vote gives a measure of the confidence. Ideally we would like to assign a weight to each cut in the graph (a higher weight to small cuts) and then take a vote over all the cuts in the graph according to their weights. We don’t know how to do this, but we can view randomized mincuts as an approximation to this.

Zhu, Gharamani and Lafferty (ICML 2003). Each unlabeled example receives a label that is the average of its neighbors. Equivalent to minimizing the squared difference of the labels. Related Work –Gaussian Processes

How to construct the graph? K-nn –Graph may not have small separators. –How to learn k? Connect all points within distance δ –Can have disconnected components. –δ is hard to learn. Minimum Spanning Tree –No parameters to learn. –Gives connected, sparse graph. –Seems to work well on most datasets.

Experiments ONE VS TWO: 1128 examples. (8 X 8 array of integers, Euclidean distance). ODD VS EVEN: 4000 examples. (16 X 16 array of integers, Euclidean distance). PC VS MAC: 1943 examples. (20 newsgroup dataset, TFIDF distance).

ONE VS TWO

ODD VS EVEN

PC VS MAC

Accuracy Coverage: PC VS MAC (12 labeled)

Conclusions We can get useful estimates of the confidence of our predictions. Often get better accuracy than plain mincut. Minimum spanning tree gives good results across different datasets.

Future Work Sample complexity lower bounds (i.e. how much unlabeled data do we need to see?). Better way of sampling mincuts? Reference A. Blum, J. Lafferty, M.R. Rwebangira, R. Reddy “Semi- supervised Learning Using Randomized Mincuts”, ICML 2004 (To appear)

Questions?