Presentation on theme: "A Study of social influence in diffusion of innovation over Facebook Shaomei Wu Information Science Cornell University Information Science."— Presentation transcript:
A Study of social influence in diffusion of innovation over Facebook Shaomei Wu firstname.lastname@example.org Information Science Cornell University Information Science Breakfast, Dec 5, 2008
Diffusion of Innovation Diffusion is the process in which an innovation is communicated through certain channels over time among the members of a social system. –––– Everett M. Rogers * innovation: Friendship Quiz – a Facebook application Communicated: Invitations among Facebook friends time: September 25, 2008 – Now social system: Facebook * Rogers, Everett M. (2003). Diffusion of Innovations, 5th ed.. New York, NY: Free Press, pp 5-6
Basic Diffusion Models Threshold ModelCascade Model Statistically Equivalent * *David Kempe, Jon Kleinberg, Eva Tardos. Maximizing the Spread of Influence through a Social Network. KDD, 2003Maximizing the Spread of Influence through a Social Network.
Cascade Model Each recommendation will succeed with certain probability. a b d e f g c h i j k l p ab p ag p ac p ad p ae p af p di p dj p gk p gl p ab non-adopter adopter social link recommendation Question: how to estimate p uv ?
Current practice Constant  Based on ONLY network structure (e.g., in/out-degree)   Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst, Cascading Behavior in Large Blog Graphs. SDM 2007.  Jure Leskovec, Lada Adamic, Bernardo Huberman. The Dynamics of Viral Marketing. ACM Conference on Electronic Commerce (EC) 2006. Do individuals and the social relationship among them matter?
Theories from Empirical Diffusion Research: Opinion leaders: who own greater exposure to mass media than their followers, are more cosmopolite, have greater social participation,have higher socioeconomic status, and are more innovative [Rogers 2003, pp 316-318]. The importance of heterophily between participants on certain attributes (i.e., education and socioeconomic status) at determining the efficiency of diffusion, despite the fact that more effective communication occurs when two or more individuals are homophilous [Rogers, 2003, pp19]
This project is to… Model p uv s for cascade model Identify the most influential factors at determining p uv Predict the success of contagion Exploit Facebook data A real-world, ongoing diffusion instance; Rich and (most of the time) trustable profile information of individuals and their social connections/activities; Precisely timestamped diffusion process, a complete log of events;
Status Launched: Sep 25, 2008. Currently used data is until: Nov 25, 2008. 216 adopters, 375 individuals, 737 edges between 266 pairs of people, 90 successful infection 178 failed infection Network Evolution (in the first month after release) Network Evolution
Predict the success of invitation with SVM A Binary classifier: each invitation is either successful or failed. Features Individual features Pair features (homophily/heterophily)
Individual Features # of events attended/invited # of photo tagged # of wall posts # of networks # of groups participated # of notes Religion Political View Gender Age Culture Background Relationship Status Work Info Education Info Social Activeness Socioeconomics Education Innovativeness
Pair-wise Features Age difference Same gender? Same political view? Same religion? Same culture background? # of same networks # of photos both tagged # of groups both participated # of events both attended Same education level? Same high school? Same college? Same workplace? Same current city? Biological traits Socioeconomics Proximity Belief
timesenderreceiverclass sender features receiver features pair features 2008-09-25 18:25:41 589483260 36211851 1:22 2:1 3:0 4:0 5:0 6:1 … … 35:1 47:0 48:0 49:0 50:0 51:0 … 68:0 69:0 70:0 74:1 76:1 … 2008-09-25 18:25:49 3621185571023231……… ………………… ………………… 2008-11-24 02:40:34 76805941381405257……… Training Data Each invitation is a training example - machine learning. * all numerical features are normalized across examples.
AdaBoost (with DecisionDump) A popular way to do feature selection. Selected Features sender wall post count sender group count sender network count receiver age receiver group count sender & receiver common group count Performance (10-fold cross validation) Accuracy: 83.6% ClassprecisionRecall 83.5%93.8% 183.8%63.3%
Result SVM-light performance 209 records into 5 folds, 4 for training, 1 for testing. Performance on the testing set: Accuracy: 71.43% (30 correct, 12 incorrect, 42 total) Precision/recall: 55.56%/38.46% Feature weights distribution Top weighted features: 8, sender_events_invited, 4, sender_friend_count, 11, sender_gender 35, receiver_is_It's Complicated 5, sender_wall_post_count, 9, sender_note_count 27. sender_is_In a Relationship So, the story can be: when a sender who has been invited to greater number of events in Facebook, has more friends, wrote more Facebook notes (blog entries), is female, has less wall posts, in a relationship, tried to infect a person whose relationship status is its complicated, its more like the infection will happen compared to other cases.
SVM with features selected by AdaBoost foldaccuracyprecisionrecall 180.7710058.33 280.7783.3355.56 388.4610062.5 473.0800 576.9210040 684.6283.3362.5 776.9266.6750 880.7710061.54 996.1510088.89 1091.1883.3371.43 average82.9681.6755.075
Background Diffusion of Innovation Question: How does it work in large online social networks? What are the key factors at determining the success of infection? Can we predict the propagation path?
Hypothesis Social influence depends on 5 dimensions of similarities: geographical distance current location(country/state/city), current school, current major, year of class, current workplace, current courses enrolled; background similarity sex, sexual preference, dating interest, relationship interest, relationship status, birthday, political view, religious view, hometown address, previous school, previous workplace; social similarity number of mutual networks they belong to, number of mutual friends; interest similarity activities, favorite books, favorite music, favorite movies, favorite TV shows, favorite quotas; social status distance difference of numbers of friends, difference of wallpost counts, difference of counts of message sent and received, difference of counts of notes.
Project Description Objectives Identify the key factors for social influence; Predict occurrence of adoption based on the key factors. Friendship Quiz A Facebook application we developed; Enable users to make quizzes and send to their friends (take a peek!);take a peek! We track the spread of application.
Highlights A real-world diffusion of innovation; Rich and (most of the time) trustful profile information of individuals and their social connections/activities; Precisely timestamped diffusion process, a complete log of events; Ongoing diffusion process