Sofus A. Macskassy Fetch Technologies

Slides:

Advertisements

Similar presentations

When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.

Advertisements

Lazy Paired Hyper-Parameter Tuning

Sparsification and Sampling of Networks for Collective Classification

Random Forest Predrag Radenković 3237/10

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.

On-line learning and Boosting

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

Absorbing Random walks Coverage

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at

Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Active Learning for Class Imbalance Problem

DATA MINING LECTURE 13 Absorbing Random walks Coverage.

1 Converting Categories to Numbers for Approximate Nearest Neighbor Search 嘉義大學資工系郭煌政 2004/10/20.

DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :

Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.

Learning to Play the Game of GO Lei Li Computer Science Department May 3, 2007.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.

Semi-Supervised Clustering

Clustering CSC 600: Data Mining Class 21.

Data Driven Resource Allocation for Distributed Learning

Modeling Annotator Accuracies for Supervised Learning

Advanced data mining with TagHelper and Weka

Semi-supervised Machine Learning Gergana Lazarova

Rule Induction for Classification Using

Constrained Clustering -Semi Supervised Clustering-

Author: Hsun-Ping Hsieh, Shou-De Lin, Yu Zheng

Machine Learning Lecture 9: Clustering

Classification of unlabeled data:

COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION Fabricio Aparecido Breve, Daniel Carlos Guimarães Pedronette State University.

Learning with information of features

Haim Kaplan and Uri Zwick

Jianping Fan Dept of Computer Science UNC-Charlotte

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Ensemble learning.

Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.

GANG: Detecting Fraudulent Users in OSNs

Concave Minimization for Support Vector Machine Classifiers

Mingzhen Mo and Irwin King

Actively Learning Ontology Matching via User Interaction

Sofia Pediaditaki and Mahesh Marina University of Edinburgh

Instructor: Vincent Conitzer

Presentation transcript:

Sofus A. Macskassy Fetch Technologies sofmac@fetch.com Using Graph-based Metrics with Empirical Risk Minimization to Speed Up Active Learning on Networked Data Sofus A. Macskassy Fetch Technologies sofmac@fetch.com

Context Types of learning with labeled data Supervised learning is given a training set Semi-supervised (or transductive learning) is given a partially labeled data set Both learning methodologies seeks to induce a model to predict labels on unlabeled instances Active Learning seeks to help the learner induce the best model with the fewest labeled instances Picks the next training instance which will (probably) get the most boost in performance KDD 2009 - Sofus Macskassy

Motivation One well-known and popular active learning strategy iterates through all instances and computes the likely boost in performance if one were to label that particular instance. This is known as empirical risk minimization (ERM). Problem: ERM is very costly (induce and evaluate a new classifier for each class for each unlabeled instance) Proposed solution: Identify a small set of candidate instances on which to compute ERM KDD 2009 - Sofus Macskassy

Key Observation from Prior Work Prior work on pairing a graph-based semi-supervised learning method with active learning observed that ERM tended to pick instances that were in the center of clusters  Can we leverage this observation? KDD 2009 - Sofus Macskassy

Can we improve running time of ERM? Idea: Keep ERM, but limit ERM computation to the “best” candidates rather than all How? Use clustering and pick among the most central instances in each cluster? Closest in spirit to prior key observation Pick from the top-K most uncertain instances? Prior work on uncertainty sampling Use graph-based metrics to identify central instances and pick among the most central? Global and possibly a more consistent metric KDD 2009 - Sofus Macskassy

Selecting Best Candidates (1) Uncertainty labeling Use the current model to identify the unlabeled instances it is most uncertain of Uncertainty of vertex v: KDD 2009 - Sofus Macskassy

Selecting Best Candidates (2) Highest betweenness Betweenness centrality which instance has the most information flow = number of shortest paths between s and t = number paths that go through v Note: Need to compute all shortest-paths – can do this efficiently: O(nE) ~ O(n2) for sparse graphs KDD 2009 - Sofus Macskassy

Selecting Best Candidates (3) Highest closeness Closeness centrality which instance is “closest” to all other instances Note: Need to compute all pairwise distances – can do this efficiently: O(nE) ~ O(n2) for sparse graphs KDD 2009 - Sofus Macskassy

Selecting Best Candidates (4) Highest cluster closeness Central nodes in a cluster Cluster the graph Chose most central instances in cluster: = vertices in cluster that v belongs to Clustering details in paper KDD 2009 - Sofus Macskassy

Real world data is not “clean” (from CoRA data) all edges prob meth. (yellow) theory (green) genetic algs (red) rule learning (blue) neural nets (pink) RL (white) case-based (orange)

Empirical Study: Which method is best? Compare strategies (uncertainty, betweenness) to: Full ERM (current optimal) Random sampling (baseline) Metrics: accuracy and time-to-run Methodology Initialize: Randomly pick 1 instance per class Iteratively pick next instance using each methodology and record accuracy on remaining instances until 100 instances have been picked Repeat 10 times, record average accuracy KDD 2009 - Sofus Macskassy

11 Benchmark Data Sets WebKB [Craven 1998] (8 data sets) 4 computer science websites (sizes: 338-434) Each graph had a 6 class problem and a 2 class problem Industry classification [Bernstein et al. 2003] (2 data sets) 2 sources (prnews , Yahoo!) – sizes: 1798/218 Network = companies that co-occur in financial news stories 12 class problem CoRA [McCallum et al. 2000] (1 data set) 4240 academic papers Network = citations 6 class problem KDD 2009 - Sofus Macskassy

erm vs. betweenness vs. uncertainty KDD 2009 - Sofus Macskassy

Combine new strategies? None of the new strategies worked very well by themselves. However, the top ERM pick was often at the top of at least one strategy’s picks… New hybrid approach: Pick the top-K instances from each strategy (uncertainty, cluster closeness, betweenness) Pick the instance with the highest ERM score KDD 2009 - Sofus Macskassy

erm vs hybrid KDD 2009 - Sofus Macskassy

Which strategies were used in the hybrid? Dataset Cluster Closeness Uncertainty Sampling Betweenness Centrality Number of ties cora 11.00% 12.30% 76.90% 1 industry-pr 9.00% 36.80% 54.40% 2 industry-yh 1.90% 30.40% 68.90% 12 cornell-binary 12.60% 62.30% 36.90% 118 cornell-multi 3.50% 75.50% 29.90% 89 texas-binary 9.80% 40.10% 59.20% 91 texas-multi 16.30% 68.50% 28.50% 113 washington-binary 20.60% 72.50% 28.00% 211 washington-multi 25.70% 72.00% 25.10% 228 wisconsin-binary 8.80% 62.00% 27.40% 282 wisconsin-multi 24.00% 69.80% 25.40% 192 “larger” graphs KDD 2009 - Sofus Macskassy

Conclusions Empirical Risk Minimization is a strong active learning strategy but it is too slow We have shown that we can efficiently identify a small set of candidate nodes that contain the (close to) best instance as defined by ERM If the data is relational, we found that using graph metrics such as clustering, closeness and betweenness can identify a good candidate set Performs comparably to full blown ERM but runs an order of magnitude faster Potential for great speedups in larger graphs KDD 2009 - Sofus Macskassy

Future Work Using graph metrics to identify a candidate set seems like it has a lot of potential but it needs more work Need better understanding of metric behavior and how it relates to ERM For example, why did cluster closeness not do so well in practice? How do we incorporate labeled information into metrics? More work on network metrics needed KDD 2009 - Sofus Macskassy

Thank you Sofus Macskassy KDD 2009 - Sofus Macskassy