Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Similar presentations


Presentation on theme: "Yingze Wang and Shi-Kuo Chang University of Pittsburgh"— Presentation transcript:

1 Yingze Wang and Shi-Kuo Chang University of Pittsburgh
User Profile Visualization to facilitate MSLIM-model-based Social Influence Analysis based on Slow Intelligence Approach Yingze Wang and Shi-Kuo Chang University of Pittsburgh

2 Information Diffusion
Anything propagates over a network comes under the umbrella of “Information diffusion”

3 Two Major Goals in Social Influence Analysis
Prediction or forecasting Influential node detection Predict the volume of contagions: Capture the trend of information Predict the outbreak (e.g. disease) Identify the influential nodes: Viral marketing Preventing spreading of infectious disease A Unified Methodology

4 Multitask Sparse Linear Influence Model (MSLIM)
Challenges Implicit Network Structure: Only observe that a node got “infected” without knowing who infected whom E.g., social media (blog), disease spread Q: How to make forecasting and node detection without knowing network structure? Multiple Correlated Contagions Simultaneously Spreading: Q: how to incorporate the relatedness of contagions in the model ? E.g. similar contagions => similar future volumes Temporal Effect of the diffusion process Multitask Sparse Linear Influence Model (MSLIM)

5 Smooth Prediction Loss
Our Approach: Multitask Sparse Linear Influence Model (MSLIM) (AAAI-13) Constraint Smooth Prediction Loss Non-Smooth Penalty

6 Optimal solution of MSLIM
Influential Nodes for Contagion k Predict the total volume for Contagion k Evaluation Metric: Predicted Mean Squared Error (MSE)

7 Summary: MSLIM Advantages: Limitations:
(1) Simultaneously conducts the diffusion prediction and influential node detection in a unified framework. (2) Do not require the prior knowledge of the network structure. (3) Contagion-sensitive node detection: detecting different sets of influential nodes for different contagions. Limitations: Total volume V is a function of a small active node set N. Different active node set produces different result. In our previous work, we simply selected N active twitter users with at least 1,000 tweets during the certain time period. However this active user set may not be optimized. How to determine the optimal active node sets in MSLIM?

8 Motivation: Slow Intelligence Approach
In social network (e.g. Twitter), each user has the profile. full name, followers count, the location, friends count, the account created time, etc. It is a feasible way to determine active user set by different user profile properties. Ideas: Slow Intelligence Visualization help people analyze the distribution to facilitate selecting the proper active user set Search the best active user set according to different properties

9 Slow Intelligence System [Shi-Kuo Chang, 2010]
Slow Intelligence Systems are general-purpose systems characterized by being able to improve performance over time through a process involving: Enumeration Propagation Adaptation Elimination Concentration

10 SIS-BASED ACTIVE NODE SETS DEFINATION SYSTEM
GUIs: with the visualization techniques to help define active user set Input: the entire users list with the user profiles Output: active user list Learning package: implements the MSLIM model Input: active user list and data corpus Output: influential users and predicted mean square error

11 GUI interfaces Each GUI utilizes one user profile property. It enables designer to define the range of corresponding property and filters the data set. e.g. Twitter: followers count, friends count, location, created time Visualization: proper visualization techniques in each GUI to visualize the distribution of active node set according to each property The GUI can show the final predicted MSE result to help designer evaluate his strategy and further improve the way to choose more appropriate active user set.

12 GUI interfaces (Cont’d)
Three GUIs use histogram: follower counts friend counts created time

13 GUI interfaces (Cont’d)
One GUI uses choropleth map: location

14 GUI interfaces (Cont’d)
Hybrid GUI: incorporate four properties together and allow designer to define active node set by any logic conditions

15 Operational Process of System

16 Experiment:Data Collection
Crawling: all of tweets of a set of 1000 users (TechCrunch) from January 2009 to November 2011. Each tweet: the full text, the author, and the time-stamp. Each user: user name, followers/friends count, location, description, etc. In our dataset, each user has tweets in average. Preprocessing the raw tweets: ignore the URLs or shortened URLs remove the format removed all stopwords and special symbols LDA: GibbsLDA Extract 50 interesting topics from twitter data set [Xiang, et al., 12] [Yang et al., 10, Xiang, et al., 12, Williams et al., 12] [Phan et. al. 2007] [Griffiths and Steyvers,2007]

17 Experiments: Cycle One
For each property, we develop the particular ranges as conditions in individual GUI to filter the total user list. We select the entire users in each range as the active node set and run MSLIM to get the predicted MSE. We select the proper conditions indicated in bold when setting predicted MSE threshold as C1, C2, C3, C4

18 Experiments: Cycle Two
In hybrid GUI, use C1, C2, C3, C4 in previous cycle and construct the logical expressions with these four conditions to filter the user list Union set achieves the best performance indicated in red The result is also superior to the result in our previous work [AAAI2013], where we simply selected the active twitter users with at least 1,000 tweets during the certain time period.

19 Conclusion SIS-BASED ACTIVE NODE SETS DEFINATION SYSTEM Advantages
By utilizing slow intelligence system, the system evolutionarily searches for the proper active node set to improve MSLIM model performance. Incorporate two visualization techniques (histogram and choropleth map) to facilitate designer defining active node sets.

20 Future Work User profiles can be based upon more properties: followers count, friends count, created time, location, and ? Spatial/Temporal partitioning rules (patterns) Multiple decision cycles in SIS-based system A more sophisticated SIS framework for optimization 20

21 Thank You! Q & A


Download ppt "Yingze Wang and Shi-Kuo Chang University of Pittsburgh"

Similar presentations


Ads by Google