Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

HOPS: Efficient Region Labeling using Higher Order Proxy Neighborhoods Albert Y. C. Chen 1, Jason J. Corso 1, and Le Wang 2 1 Dept. of Computer Science.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
1 Jie Tang, Chenhui Zhang Tsinghua University Keke Cai, Li Zhang, Zhong Su IBM, China Research Lab Sampling Representative Users from Large Social Networks.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Confluence: Conformity Influence in Large Social Networks
Maximizing the Spread of Influence through a Social Network
In Search of Influential Event Organizers in Online Social Networks
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Maximizing Product Adoption in Social Networks
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Models of Influence in Online Social Networks
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Energy Efficient Phone-to-Phone Communication Based on WiFi Hotspots in PSN En Wang 1,2, Yongjian Yang 1, and Jie Wu 2 1 Dept. of Computer Science and.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
December 7-10, 2013, Dallas, Texas
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
On Node Classification in Dynamic Content-based Networks.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
1 Latency-Bounded Minimum Influential Node Selection in Social Networks Incheol Shin
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Distributed cooperation and coordination using the Max-Sum algorithm
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
Exploiting Input Features for Controlling Tunable Approximate Programs Sherry Zhou Department of electronic engineering Tsinghua University.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Deep Feedforward Networks
New Characterizations in Turnstile Streams with Applications
Cross-lingual Knowledge Linking Across Wiki Knowledge Bases
The Importance of Communities for Learning to Influence
Random Sampling over Joins Revisited
Learning to Combine Bottom-Up and Top-Down Segmentation
Weakly Learning to Match Experts in Online Community
Sequential Data Cleaning: A Statistical Approach
Structural influence:
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Example: Academic Search
Expectation-Maximization & Belief Propagation
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Discriminative Probabilistic Models for Relational Data
Actively Learning Ontology Matching via User Interaction
Kazuyuki Tanaka Graduate School of Information Sciences
Information Sciences and Systems Lab
Presentation transcript:

Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and Technology Tsinghua University, China

An Example

Instances Correlation

An Example Instances Correlation ? ? ? ? ? ? Classify each instance into {+1, -1}

An Example Instances Correlation +1 ? +1 ? ?

An Example Instances Correlation +1 ? +1 ? ? Query for label

An Example Instances Correlation +1 ? +1 ?

Problem: Active Learning for Networked Data Instances Correalation +1 ? +1 ? ? Challenge It is expensive to query for labels! Questions Which instances should we select to query? How many instances do we need to query, for an accurate classifier?

Challenges Active Learning for Networked Data How to leverage network correlation among instances? How to query in a batch mode?

Batch Mode Active Learning for Networked Data Given a graph Unlabeled instances Features Matrix Labeled instances Labels of labeled instances Edges Our objective is Subject to A subset of unlabeled instances The utility function Labeling budget

Factor Graph Model ? ? ? ? ? ? Variable Node Factor Node

Factor Graph Model The joint probability Local factor function Edge factor function Log likelihood of labeled instances

Factor Graph Model Learning Gradient descent Calculate the expectation: Loopy Belief Propagation (LBP) Message from variable to factor Message from factor to variable

Question: How to select instances from Factor graph for active learning?

Basic principle: Maximize the Ripple Effects ? ? ? ? ? ?

Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated

Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated

Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated Statistical bias is propagated How to model the propagation process in a unlabeled network?

Diffusion Model Linear Threshold Model Progressive Diffusion Model Non-Progressive Diffusion Model Linear Threshold

Maximize the Ripple Effects ? ? ? +1 ? ? Labeling information is propagated Statistical bias is propagated Will it be dominated by labeling information (active) or statistical bias (inactive)? Based on non-progressive diffusion model Maximize the number of activated instances in the end We aim to activate the most uncertain instances!

Instantiate the Problem Active Learning Based on Non-Progressive Diffusion Model, The number of activated instances With constraints Initially activate all queried instances We activate the most uncertain instances Based on the non-progressive diffusion

Reduce the Problem The original problem The reduced problem Constraints are inherited. Reduction procedure

Algorithm The reduced problem The key idea

Algorithm

Theoretical Analysis Convergence Lemma 1 The algorithm will converge within time. Correctness Approximation Ratio

Experiments Datasets #Variable node#Factor node Coauthor6,09624,468 Slashdot3701,686 Mobile Enron Comparison Methods Batch Mode Active Learning (BMAL), proposed by Shi et al. Influence Maximization Selection (IMS), proposed by Zhuang et al. Maximum Uncertainty (MU) Random (RAN) Max Coverage (MaxCo), our method

Experiments Performance

Related Work Active Learning for Networked Data Actively learning to infer social ties H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang Batch mode active learning for networked data L. Shi, Y. Zhao and J. Tang Towards active learning on graphs: an error bound minimization approach Q. Gu and J. Han Integreation of active learing in a collaborative crf O. Martinez and G. Tsechpenakis Diffusion Model On the non-progressive spread of influence through social networks M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad Maximizing the spread of influence through a social network D. Kempe, J. Kleinberg and E. Tardos

Conclusion Connect active learning for networked data to non-progressive diffusion model, and precisely formulate the problem Propose an algorithm to solve the problem Theoretically guarantee the convergence, correctness and approximation ratio of the algorithm Empirically evaluate the performance of the algorithm on four datasets of different genres

Future work Consider active learning for networked data in a streaming setting, where data distribution and network structure are changing over time

About Me Zhilin Yang 3 rd year undergraduate at Tsinghua Univ. Applying for PhD programs this year Data Mining & Machine Learning

Thanks!