In Search of Influential Event Organizers in Online Social Networks

Slides:



Advertisements
Similar presentations
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Advertisements

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.
Deployment of Surface Gateways for Underwater Wireless Sensor Networks Saleh Ibrahim Advising Committee Prof. Reda Ammar Prof. Jun-Hong Cui Prof. Sanguthevar.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Influence Maximization
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Distributed Constraint Optimization * some slides courtesy of P. Modi
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Models of Influence in Online Social Networks
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Querying Structured Text in an XML Database By Xuemei Luo.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Querying Business Processes Under Models of Uncertainty Daniel Deutch, Tova Milo Tel-Aviv University ERP HR System eComm CRM Logistics Customer Bank Supplier.
December 7-10, 2013, Dallas, Texas
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Online Social Networks and Media
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
An Efficient Greedy Method for Unsupervised Feature Selection
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Kijung Shin Jinhong Jung Lee Sael U Kang
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Efficient Semi-supervised Spectral Co-clustering with Constraints
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Unsupervised Streaming Feature Selection in Social Media
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
Agenda  INTRODUCTION  GENETIC ALGORITHMS  GENETIC ALGORITHMS FOR EXPLORING QUERY SPACE  SYSTEM ARCHITECTURE  THE EFFECT OF DIFFERENT MUTATION RATES.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
TOPIC: TOward Perfect InfluenCe Graph Summarization Lei Shi, Sibai Sun, Yuan Xuan, Yue Su, Hanghang Tong, Shuai Ma, Yang Chen.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Nanyang Technological University
Data Driven Resource Allocation for Distributed Learning
Greedy & Heuristic algorithms in Influence Maximization
MEIKE: Influence-based Communities in Networks
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Clustering Uncertain Taxi data
A Consensus-Based Clustering Method
The Importance of Communities for Learning to Influence
Effective Social Network Quarantine with Minimal Isolation Costs
Weakly Learning to Match Experts in Online Community
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Viral Marketing over Social Networks
Lecture 2-6 Complexity for Computing Influence Spread
Presentation transcript:

In Search of Influential Event Organizers in Online Social Networks Kaiyu Feng1, Gao Cong1, Sourav S. Bhowmick1, and Shuai Ma2 1Nanyang Technological University 2Beihang University

Outline Motivation & Problem Definition Greedy solutions Approximation solutions Experiments

Motivation A data-driven approach to selecting influential event organizers in online social networks Increasing popularity and growth of online social networks (e.g., event based social networks) To organize an event (picnic), we need to find some organizers who together have relevant expertise (driving, cooking) and can influence as many people as possible to attend and contribute

Motivating example Query: Search for 2 chairs Tom, “Machine Learning”, “Data Mining” Bob “Psychology”, “Sociology” …… …… Bill, “NLP”, “Machine Learning” Sam, “Database”, “Data Mining” 加粗箭头 goal Query: Search for 2 chairs (1) who together have knowledge in “Psychology”, “Sociology” and “Data Mining”; (2) influence as many people as possible to contribute and attend ……

An inside look at the example An online social network 𝐺(V, E, 𝒜) 𝒜 𝑣 : the expertise of 𝑣 for 𝑣∈𝑉 A set 𝑄 of required expertise A small set of organizers: Together have knowledge in 𝑸: 𝑄⊆∪ 𝑣∈𝑆 𝒜 𝑣 Influence as many people as possible: Independent Cascade Model: Nodes are active or inactive Each active node has one chance to activate its inactive neighbors with a probability Influence model Reference Color keyword

𝑆= arg max 𝜎 𝒢 (𝑆) , 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑄⊆ ∪ 𝑠∈S 𝒜(𝑠), Problem definition Given: a set of attributes 𝑄, a parameter 𝑘, and an online social network 𝒢(𝑉, 𝐸, 𝑤, 𝒜) The influential cover set (ICS) problem aims at selecting 𝑘 seed nodes 𝑆: 𝑆= arg max 𝜎 𝒢 (𝑆) , 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 𝑄⊆ ∪ 𝑠∈S 𝒜(𝑠), Here 𝜎 𝒢 (𝑆): the influence spread of S on 𝒢 Assumption: |𝑄| is bounded by a constant.

Example Query: 𝑘=2 𝑄={𝑝𝑠𝑦𝑐ℎ𝑜𝑙𝑜𝑔𝑦, 𝑠𝑜𝑐𝑖𝑜𝑙𝑜𝑔𝑦, 𝑑𝑎𝑡𝑎 𝑚𝑖𝑛𝑖𝑛𝑔} Tom, “Machine Learning”, “Data Mining” Bob “Psychology”, “Sociology” …… …… Bill, “NLP”, “Machine Learning” Sam, “Database”, “Data Mining” …… Query: 𝑘=2 𝑄={𝑝𝑠𝑦𝑐ℎ𝑜𝑙𝑜𝑔𝑦, 𝑠𝑜𝑐𝑖𝑜𝑙𝑜𝑔𝑦, 𝑑𝑎𝑡𝑎 𝑚𝑖𝑛𝑖𝑛𝑔}

Novelty and challenge Influence Maximization. No attribute coverage constraint Team formation Different optimization object Challenge of ICS problem: Cover the attributes in 𝑄 Optimize influence spread Complexity: The ICS problem is NP-hard Challenge and nolvety

Outline Motivation & Problem Definition Greedy solutions Approximate solutions Experiments

Greedy solutions ScoreGreedy Based on a score function PigeonGreedy Based on the pigeonhole principle

ScoreGreedy The goodness of a node is measured by a score function valued from the following two aspects: The marginal influence increase The number of newly covered attributes Main idea: Greedily select 𝑘 seed nodes based on the score function

PigeonGreedy Lemma: Based on the pigeonhole principle, if a seed set 𝑆 with 𝑘 nodes can cover the attribute set 𝑄, then at least one node in 𝑆 can cover no fewer than |𝑄| 𝑘 attributes Main Idea: Iteratively apply the lemma and select seeds in a greedy manner

Outline Motivation & Problem Definition Greedy solutions Approximation solutions Experiments

Approximation solutions The greedy solutions cannot guarantee to find a seed set to cover 𝑄 even if such a seed set exists. Motivated by this, we propose two approximation solutions that guarantee to find such a seed set Partition-based Influential Cover Set algorithm (PICS) Based on a notion of partitions Optimized PICS algorithm (PICS+) Based on a notion of cover-groups

PICS: partition 𝑃={ 𝐴 1 , …, 𝐴 𝑚 }(𝑚≤𝑘) is a partition of 𝑄 iff the attribute sets 𝐴 1 , …, 𝐴 𝑚 in 𝑃 are Nonempty Disjoint together cover 𝑄 Example 𝑄={𝑎,𝑏,𝑐,𝑑,𝑒} and 𝑘=3. {{𝑎,𝑏,𝑐},{𝑑,𝑒}} is a partition; {{𝑎,𝑏,𝑐},{𝑐,𝑑,𝑒}} is not a partition

PICS algorithm For each partition Compute a seed set Return the seed set with maximum influence spread Theorem. Approximation Ratio: ½−𝜙

PICS: compute seed set for a partition 𝑄={𝑎,𝑏,𝑐,𝑑,𝑒}, 𝑘 = 3 𝑃={{𝑎,𝑏,𝑐},{𝑑,𝑒}} 𝑆={} Phase 1 Free set { 𝑢 1 , 𝑢 2 } Select from 𝑉 𝑄 𝑢 1 covers {𝑐,𝑑,𝑒} 𝑃={{𝑎,𝑏}} 𝑆={ 𝑢 1 } Select from 𝑉 𝑄 𝑢 2 covers {𝑐,𝑒} Partial partition 𝑃={{𝑎,𝑏}} 𝑆={ 𝑢 1 , 𝑢 2 } Select from 𝑉( 𝑎,𝑏 ) Phase 2 Constraint set { 𝑢 3 } 𝑃={{𝑎,𝑏}} 𝑆={ 𝑢 1 , 𝑢 2 , 𝑢 3 }

PICS+ algorithm PICS needs to enumerate all partitions Number of partitions: 115,975 when |𝑄| = 10 We leverage a notion of cover-groups to re-organize the partitions. Based on such organizations, we propose PICS+ algorithm that is instance optimal in pruning unnecessary partial partitions.

PICS+: cover-group A cover-group of a partial partition is a multiset of integers { 𝑟 1 , …, 𝑟 𝑚 }, each of which is the size of an attribute set in the partition. Partial partition {{𝑎,𝑏,𝑐},{𝑑,𝑒}} ‘s cover-group is {3,2}.

PICS+: organize partitions Reorganization Partitions are first organized according to their free set. For the partial partitions generated in the fist step, we further group them based on their cover-groups

PICS+ algorithm Select free set with size 𝑖∈[0,𝑘] For each cover-group Compute a constraint set Get the seed set Return the seed set with best influence spread Theorem. Approximation ratio: ½−𝜙 Instance optimal in pruning unnecessary partial partitions

PICS+: compute constraint set for a cover-group Construct lists for each cover-group. Each list corresponds an integer in the cover-group Cover-group {3,2} [{𝑎𝑏𝑐}{𝑑𝑒}]: 𝑣 1 , 𝑣 4 :95

PICS+: compute constraint set for a cover-group [{𝑎𝑏𝑐}{𝑑𝑒}]: 𝑣 1 , 𝑣 4 :95

PICS+: compute constraint set for a cover-group Instance optimal [{𝑎𝑏𝑐}{𝑑𝑒}]: 𝑣 1 , 𝑣 4 :95

Experimental results Datasets Evaluated algorithms ScoreGreedy, denoted as SG PigeonGreedy, denoted as PG PICS PICS+ We adopt IRIE (K. Jung et al., ICDM 2012) to compute the influence spread. Property Flixster (FX) PlanCast (PC) DBLP MeetUp # of nodes 38,834 76,665 874,305 1,013,453 # of edges 164,093 1,702,058 9,415,206 34,410,754 # of distinct attr. 37,036 103,289 89,975 64,721 Avg. # of attr. per node 47 9 27 11

Query and measures (𝑘, |𝑄|): 30 queries, randomly generated, guaranteed that there exists a seed set with 𝑘 nodes to cover all attributes Success Rate: the percentage of queries that the algorithms can successfully find a seed set to cover all the attributes Influence Spread: the average number of influenced nodes of 20,000 simulations .

Success Rate (1) PICS & PICS+ guarantee to find a seed set to cover all the attributes in 𝑄 (2) The success rates of the two greedy algorithms are not very promising.

Comparison of greedy solutions and Approximation solutions Influence spread Runtime (1) Greedy algorithms are efficient, PICS+ is also acceptable. (2) The PICS+ outperforms the two greedy algorithms in terms of influence spread.

Effects of propagation probability Degree: p 𝑢,𝑣 = 1 𝑁 𝑖𝑛 (𝑣) Random: randomly selected from {0.1, 0.01, 0.001} TopicPP: 𝑝 𝑢,𝑣 =max⁡( 𝒜 𝑢 ⋅ 𝒜 𝑣 ⋅ 𝒜 𝑢 ∩𝒜 𝑣 𝑄 3 , 1) TIC: Adopt the TIC model, learn from the historical action log of FX.

Effect of propagation probability Our solutions to the ICS problem are insensitive to the influence probability of each edge in the graph.

Comparison of IM and ICS Jaccard Similarity(JC): | 𝑆 𝐼𝐶𝑆 ∩ 𝑆 𝐼𝑀 | | 𝑆 𝐼𝐶𝑆 ∪ 𝑆 𝐼𝑀 | Attribute Coverage Ratio(ACR): | ∪ 𝑠∈𝑆 𝐴 𝑠 ∩𝑄| |𝑄| Traditional IM techniques can not be directly used to solve the ICS problem

Comparison of PICS and PICS+ PICS+ is much more efficient than PICS for larger |𝑄|

Conclusion We formulate ICS problem to select influential event organizers from online social networks NP-hard Greedy solutions Based on score function Based on the pigeonhole principle Approximation Solutions PICS – based on a notion of partitions PICS+ – based on a notion of cover-groups Experiments show our solutions are effective and efficient

Thanks Q&A