Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Slides:



Advertisements
Similar presentations
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
Models of Influence in Online Social Networks
Authors: Xu Cheng, Haitao Li, Jiangchuan Liu School of Computing Science, Simon Fraser University, British Columbia, Canada. Speaker : 童耀民 MA1G0222.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
© 2010 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Topic (iii): Macro Editing Methods Paula Mason and Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Ljubljana, Slovenia, 9-11 May 2011.
Human Computer Interaction
Measuring Behavioral Trust in Social Networks
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Cognos BI. What is Cognos? Cognos (Cognos Incorporated) was an Ottawa, Ontario-based company that makes Business Intelligence (BI) and Performance Management.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
The Application of Data Mining in Telecommunication by Wang Lina February 2003.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Cohesive Subgraph Computation over Large Graphs
Data Mining – Intro.
Topical Authority Detection and Sentiment Analysis on Top Influencers
Greedy & Heuristic algorithms in Influence Maximization
UNIT – V BUSINESS ANALYTICS
DM-Group Meeting Liangzhe Chen, Nov
User Joining Behavior in Online Forums
Preface to the special issue on context-aware recommender systems
Summary Presented by : Aishwarya Deep Shukla
E-Commerce Theories & Practices
TDR System - A Multi-Level Slow Intelligence System for Personal Health Care Shi-Kuo Chang, JunHui Chen, Wei Gao and Qui Zhang University of Pittsburgh.
Epidemic Alerts EECS E6898: TOPICS – INFORMATION PROCESSING: From Data to Solutions Alexander Loh May 5, 2016.
Aspect-based sentiment analysis
Experiment Evaluation
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Luca Lugini Publication by Yingze Wang, Guang Xiang, and Shi-Kuo Chang
An Efficient method to recommend research papers and highly influential authors. VIRAJITHA KARNATAPU.
EVAAS Overview.
Lecture 12: Data Wrangling
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Q4 : How does Netflix recommend movies?
Mixture of Mutually Exciting Processes for Viral Diffusion
Data Warehousing and Data Mining
Discovering Functional Communities in Social Media
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Generalizations of Markov model to characterize biological sequences
An Introduction to Software Architecture
CHAPTER 9 (part a) BASIC INFORMATION SYSTEMS CONCEPTS
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Spectrum Sharing in Cognitive Radio Networks
Data Warehousing Data Mining Privacy
Big DATA.
Building Topic/Trend Detection System based on Slow Intelligence
Modeling and Analysis Tutorial
Multidisciplinary Optimization
Presentation transcript:

Yingze Wang and Shi-Kuo Chang University of Pittsburgh User Profile Visualization to facilitate MSLIM-model-based Social Influence Analysis based on Slow Intelligence Approach Yingze Wang and Shi-Kuo Chang University of Pittsburgh

Information Diffusion Anything propagates over a network comes under the umbrella of “Information diffusion”

Two Major Goals in Social Influence Analysis Prediction or forecasting Influential node detection Predict the volume of contagions: Capture the trend of information Predict the outbreak (e.g. disease) Identify the influential nodes: Viral marketing Preventing spreading of infectious disease A Unified Methodology

Multitask Sparse Linear Influence Model (MSLIM) Challenges Implicit Network Structure: Only observe that a node got “infected” without knowing who infected whom E.g., social media (blog), disease spread Q: How to make forecasting and node detection without knowing network structure? Multiple Correlated Contagions Simultaneously Spreading: Q: how to incorporate the relatedness of contagions in the model ? E.g. similar contagions => similar future volumes Temporal Effect of the diffusion process Multitask Sparse Linear Influence Model (MSLIM)

Smooth Prediction Loss Our Approach: Multitask Sparse Linear Influence Model (MSLIM) (AAAI-13) Constraint Smooth Prediction Loss Non-Smooth Penalty

Optimal solution of MSLIM Influential Nodes for Contagion k Predict the total volume for Contagion k Evaluation Metric: Predicted Mean Squared Error (MSE)

Summary: MSLIM Advantages: Limitations: (1) Simultaneously conducts the diffusion prediction and influential node detection in a unified framework. (2) Do not require the prior knowledge of the network structure. (3) Contagion-sensitive node detection: detecting different sets of influential nodes for different contagions. Limitations: Total volume V is a function of a small active node set N. Different active node set produces different result. In our previous work, we simply selected N active twitter users with at least 1,000 tweets during the certain time period. However this active user set may not be optimized. How to determine the optimal active node sets in MSLIM?

Motivation: Slow Intelligence Approach In social network (e.g. Twitter), each user has the profile. full name, followers count, the location, friends count, the account created time, etc. It is a feasible way to determine active user set by different user profile properties. Ideas: Slow Intelligence Visualization help people analyze the distribution to facilitate selecting the proper active user set Search the best active user set according to different properties

Slow Intelligence System [Shi-Kuo Chang, 2010] Slow Intelligence Systems are general-purpose systems characterized by being able to improve performance over time through a process involving: Enumeration Propagation Adaptation Elimination Concentration

SIS-BASED ACTIVE NODE SETS DEFINATION SYSTEM GUIs: with the visualization techniques to help define active user set Input: the entire users list with the user profiles Output: active user list Learning package: implements the MSLIM model Input: active user list and data corpus Output: influential users and predicted mean square error

GUI interfaces Each GUI utilizes one user profile property. It enables designer to define the range of corresponding property and filters the data set. e.g. Twitter: followers count, friends count, location, created time Visualization: proper visualization techniques in each GUI to visualize the distribution of active node set according to each property The GUI can show the final predicted MSE result to help designer evaluate his strategy and further improve the way to choose more appropriate active user set.

GUI interfaces (Cont’d) Three GUIs use histogram: follower counts friend counts created time

GUI interfaces (Cont’d) One GUI uses choropleth map: location

GUI interfaces (Cont’d) Hybrid GUI: incorporate four properties together and allow designer to define active node set by any logic conditions

Operational Process of System

Experiment:Data Collection Crawling: all of tweets of a set of 1000 users (TechCrunch) from January 2009 to November 2011. Each tweet: the full text, the author, and the time-stamp. Each user: user name, followers/friends count, location, description, etc. In our dataset, each user has 2611.825 tweets in average. Preprocessing the raw tweets: ignore the URLs or shortened URLs remove the format of @username removed all stopwords and special symbols LDA: GibbsLDA Extract 50 interesting topics from twitter data set [Xiang, et al., 12] [Yang et al., 10, Xiang, et al., 12, Williams et al., 12] [Phan et. al. 2007] [Griffiths and Steyvers,2007]

Experiments: Cycle One For each property, we develop the particular ranges as conditions in individual GUI to filter the total user list. We select the entire users in each range as the active node set and run MSLIM to get the predicted MSE. We select the proper conditions indicated in bold when setting predicted MSE threshold as 15 C1, C2, C3, C4

Experiments: Cycle Two In hybrid GUI, use C1, C2, C3, C4 in previous cycle and construct the logical expressions with these four conditions to filter the user list Union set achieves the best performance indicated in red The result is also superior to the result in our previous work [AAAI2013], where we simply selected the active twitter users with at least 1,000 tweets during the certain time period.

Conclusion SIS-BASED ACTIVE NODE SETS DEFINATION SYSTEM Advantages By utilizing slow intelligence system, the system evolutionarily searches for the proper active node set to improve MSLIM model performance. Incorporate two visualization techniques (histogram and choropleth map) to facilitate designer defining active node sets.

Future Work User profiles can be based upon more properties: followers count, friends count, created time, location, and ? Spatial/Temporal partitioning rules (patterns) Multiple decision cycles in SIS-based system A more sophisticated SIS framework for optimization 20

Thank You! Q & A