Learning Influence Probabilities In Social Networks

Learning Influence Probabilities In Social Networks
By Amit Goyal, Francesco Bonchi & Laks V. S. Lakshmanan Gal Bar-On – Ben-Gurion University of the Negev, Department of Computer Science, Fall 2017

Table of contents: Introduction Related work Problem definition
Solution framework Models of Influence Algorithm – Step 1: Learning the Parameters of the models Algorithm – Step 2: Evaluating the Models Algorithm – Step 3: Predicting Time Experimental Evaluation Summary

לשאול כאן את הסטודנטים: "האם אי פעם הם הושפעו לעשות משהו רק משום שמישהו אחר עשה את אותו הדבר?"
לשאול גם "מה עשיתם? מה גרם לכם לעשות את זה?" האם גרמתם פעם למישהו לעשות משהו אחר?" "האם מוכרת לכם התופעה?" לדבר כאן על זה שעלה הצורך לחקור את השפעת משתמשים ברשתות חברתיות על משתמשים אחרים ואיך היא מפעפעת ברחבי הרשת. רוצים לראות אם מגרף ודו"ח פעולות ניתן לבנות מודלים לחיזוי הסתברויות לביצוע פעולות ע"י משתמשים.

Related work: Domingos & Richardson: - Mining the network value of customers - Mining Knowledge-sharing sites for viral marketing D. Kempe, J. M. Kleinberg and E Tardos. - Maximizing the spread of influence through a social network J. Leskovec 𝑒𝑡 𝑎𝑙. - Cost-effective outbreak detection in networks

Problem Definition: 2 e a1 5 g 7 d 10 f 11 b 12 a 16 e g 6 4 f c 9 d 6

Can we use them to build models of influence?
Problem Definition: Given a social graph and an action log, two questions are asked: Can we use them to build models of influence? Can we predict the time by which a user is expected to perform an action?

Problem Definition: When a user performs an action, he may have one of many reasons for doing so: He may have heard of it outside of the network. The action is very popular. He may be genuinely influenced by seeing his social friends perform that action.

(𝑢, 𝑎, 𝑡 𝑢 ) Action Propagation:
(𝑢, 𝑎, 𝑡 𝑢 ) Indicates that user 𝑢 performed action 𝑎 at time 𝑡 𝑢 The action log contains such a tuple for every action performed by every user. We say that an action 𝑎∈𝐴 propagates from user 𝑣 𝑖 to 𝑣 𝑗 iff: 1. 𝑣 𝑖 , 𝑣 𝑗 ∈ 𝐸 𝑖 2. ∃ 𝑣 𝑖 ,𝑎, 𝑡 𝑖 , 𝑣 𝑗 ,𝑎, 𝑡 𝑗 ∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 with 𝑡 𝑖 < 𝑡 𝑗 3. 𝑇 𝑣 𝑖 , 𝑣 𝑗 ≤ 𝑡 𝑖 . We write 𝑝𝑟𝑜𝑝 𝑎, 𝑣 𝑖 , 𝑣 𝑗 ,∆𝑡 where ∆𝑡= 𝑡 𝑗 − 𝑡 𝑖 . Action Propagation הסבר על מה זה prop וגרף הרביה.

Action Propagation: 2 e g 5 d a 1 2 4 f b 2 e g 6 4 f c 9 d 6 e a1 5 g
7 d 10 f 11 b 12 a 16 Action Propagation הסבר על מה זה prop גרף רבייה שמייצג את ההשפעה בין משתמשים לביצוע פעולה ספציפית. הגרף הוא DAG. (35 מיל' טאפלים שמתייחסים ל300 אלף פעולות שונות) לוח פעולות שבו כל טאפל מייצג פעולה שבוצעה ע"י משתמש V בזמן T. b 2 f b a

Some Necessary Definitions:
𝐴 – The universe of actions 𝐴 𝑢 - Number of actions performed by user 𝑢 𝐴 𝑢&𝑣 - Number of actions performed by both 𝑢 and v 𝐴 𝑢|𝑣 - Number of actions performed by either 𝑢 or v Action Propagation הסבר על מה זה prop וגרף הרביה. 𝐴 𝑣2𝑢 - Number of actions performed by either 𝑢 or v

Solution Framework: 𝑝 𝑢 𝑆 =1− 𝑣∈𝑆 1− 𝑝 𝑣,𝑢
Given an inactive user 𝑢 and the set of its activated neighbors 𝑆, we would like to determine 𝑝 𝑢 (𝑆), the joint influence probability of 𝑆 on 𝑢: 𝑝 𝑢 𝑆 =1− 𝑣∈𝑆 1− 𝑝 𝑣,𝑢 לדבר פה על joint influence ועל influencability If 𝑝 𝑢 𝑆 ≥ 𝜃 𝑢 we can conclude that u will activate as well

The average time delay is 𝜏 𝑣,𝑢 = 𝑎∈𝐴 𝑡 𝑢 𝑎 − 𝑡 𝑣 𝑎 𝐴 𝑣2𝑢
Solution Framework: User influenceability: 𝑖𝑛𝑓𝑙 𝑢 = 𝑎 | ∃𝑣,∆𝑡 :𝑝𝑟𝑜𝑝 𝑎,𝑣,𝑢,∆𝑡 ∧0≤∆𝑡≤ 𝜏 𝑣,𝑢 𝐴 𝑢 Action influenceability: 𝑖𝑛𝑓𝑙 𝑎 = 𝑢 | ∃𝑣,∆𝑡 :𝑝𝑟𝑜𝑝 𝑎,𝑣,𝑢,∆𝑡 ∧0≤∆𝑡≤ 𝜏 𝑣,𝑢 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑠𝑒𝑟𝑠 𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑖𝑛𝑔 𝑎 עבור משתמשים infl(u) מייצג עד כמה ניתן להשפיע על u. זה היחס בין מספר הפעולות שמישהו השפיע על u לבצע אותן לבין מספר הפעולות ש-u ביצע. עבור פעולות infl(a) מייצג את הסיכוי שתהיה השפעה כלשהי על משתמשים לבצע אותה. היחס בין מספר המשתמשים שהושפעו לבצע את a לבין סך כל המשתמשים שביצעו את הפעולה הזו. זה משמש כדי להבחין בין פעולות שיותר קרוב לודאי שיפעפעו בין משתמשים לבין שאר הפעולות. הציפייה היא שככל שה infl יותר גבוה, החיזוי יהיה יותר מדויק. The average time delay is 𝜏 𝑣,𝑢 = 𝑎∈𝐴 𝑡 𝑢 𝑎 − 𝑡 𝑣 𝑎 𝐴 𝑣2𝑢

Bernoulli Distribution
Models: Static Models: Independent on time Are easy to learn and test Bernoulli Distribution 𝑝 𝑣,𝑢= 𝐴 𝑣2𝑢 𝐴 𝑣 Jaccard Index 𝑝 𝑣,𝑢= 𝐴 𝑣2𝑢 𝐴 𝑢|𝑣 Partial Credits 𝑐𝑟𝑒𝑑𝑖 𝑡 𝑣,𝑢 𝑎 = 1 𝑤∈𝑆 𝐼( 𝑡 𝑤 𝑎 < 𝑡 𝑢 𝑎 ) -סדרת ניסויי ברנולי, כאשר הניסוי הוא כשיוזר מדבק מנסה להשפיע על השכן שלו. לכן זה היחס בין הפעולות שv ביצע וגרם לu לעשות אותן, לבין סך כל הפעולות שv ביצע. - בג'קארד מסתכלים על סך כל הפעולות שבוצעו ע"י שני המשתמשים. - כאשר u מבצע פעולה, יכולה להיות השפעה מעורבת מצד כמה שכנים שעשו בעבר את אותה פעולה ולכן כל אחד מהם יקבל קרדיט. -I היא פונקציית אינדיקטור שקובעת האם w ביצע את הפעולה לפני u או לא.

Bernoulli Model w. Partial Credits Jaccard Model w. Partial Credits
Models: Partial Credits 𝑐𝑟𝑒𝑑𝑖 𝑡 𝑣,𝑢 𝑎 = 1 𝑤∈𝑆 𝐼( 𝑡 𝑤 𝑎 < 𝑡 𝑢 𝑎 ) Static Models: Independent on time Are easy to learn and test Bernoulli Model w. Partial Credits 𝑝 𝑣,𝑢= 𝑎∈𝐴 𝑐𝑟𝑒𝑑𝑖 𝑡 𝑣,𝑢 𝑎 𝐴 𝑣 Jaccard Model w. Partial Credits 𝑝 𝑣,𝑢= 𝑎∈𝐴 𝑐𝑟𝑒𝑑𝑖 𝑡 𝑣,𝑢 𝑎 𝐴 𝑣|𝑢 -סדרת ניסויי ברנולי, כאשר הניסוי הוא כשיוזר מדבק מנסה להשפיע על השכן שלו. לכן זה היחס בין הפעולות שv ביצע וגרם לu לעשות אותן, לבין סך כל הפעולות שv ביצע. - בג'קארד מסתכלים על סך כל הפעולות שבוצעו ע"י שני המשתמשים. - כאשר u מבצע פעולה, יכולה להיות השפעה מעורבת מצד כמה שכנים שעשו בעבר את אותה פעולה ולכן כל אחד מהם יקבל קרדיט. -I היא פונקציית אינדיקטור שקובעת האם w ביצע את הפעולה לפני u או לא.

The probability of 𝑣 influencing its neighbor 𝑣 at time 𝑡
Models: Continuous Time (CT) Models: Assume that influence probabilities are continuous functions of time Very accurate Expensive to test on large data sets 𝑝 𝑢 𝑡 𝑆 =1− 𝑣∈𝑆 1− 𝑝 𝑣,𝑢 𝑡 לציין שבזמן 0 מביצוע פעולה, ההשפעה על משתמש שראה את ביצוע הפעולה היא מקסימלית. מאותו רגע היא מתחילה לדעוך ויש פחות ופחות סיכוי שהמשתמש יבצע אותה. 𝑝 𝑣,𝑢 𝑡 = 𝑝 𝑣,𝑢 0 𝑒 − 𝑡− 𝑡 𝑣 𝜏 𝑣,𝑢 The probability of 𝑣 influencing its neighbor 𝑣 at time 𝑡 The joint probability of 𝑢 being influenced by a combination of its active neighbors at time 𝑡

Models: 𝑝 𝑢 𝑆 \ 𝑣 = 𝑝 𝑢 𝑆 − 𝑝 𝑣,𝑢 1− 𝑝 𝑣,𝑢 Discrete Time (DT) Models:
The influence of 𝑣 on 𝑢 remains constant for a time window of 𝜏 𝑣,𝑢 and then drops to 0, i.e. 𝑣 is contagious to 𝑢 in the time interval 𝑡 𝑣 , 𝑡 𝑣 + 𝜏 𝑣,𝑢 . When that happens, 𝑝 𝑢 𝑆 is updated as follows: 𝑝 𝑢 𝑆 \ 𝑣 = 𝑝 𝑢 𝑆 − 𝑝 𝑣,𝑢 1− 𝑝 𝑣,𝑢 אחרי חלון הזמן, v לא מדבק יותר ולכן הוא כבר לא חלק מS ולכן ההשפעה מתעדכנת לקבוצה שלא מכילה את v. לכן גם בקרדיטים, בודקים שההפרש בין ביצועי הפעולות לא חורג מחלון הזמן. Partial credit definition for DT: 𝑐𝑟𝑒𝑑𝑖 𝑡 𝑣,𝑢 𝜏 𝑣,𝑢 𝑎 = 1 𝑤∈𝑆 𝐼 0< 𝑡 𝑤 𝑎 − 𝑡 𝑢 𝑎 ≤ 𝜏 𝑣,𝑢

Table of contents: Introduction Related work Problem definition
Solution framework Models of Influence Algorithm – Step 1: Learning the Parameters of the models Algorithm – Step 2: Evaluating the Models Algorithm – Step 3: Predicting Time Experimental Evaluation Summary

מבוסס על jaccard index with partial credits

P Q R Influence Matrix Social Graph Action Log 𝒕=𝟎 P Q R X דוגמת הרצה:
רשת חברתית עם 3 משתמשים רשימת פעולות ממוינת לפי מזהה פעולה 𝒕=𝟎

P Q R Influence Matrix Social Graph 2 Action Log 𝒕=𝟐 P Q R X
נוצר קשר חברתי בין R ו-P ברשת 𝒕=𝟐

P Q R Influence Matrix Social Graph 4 2 Action Log 𝒕=𝟒 P Q R X
נוצר קשר חברתי בין Q ו-P ברשת 𝒕=𝟒

P Q R Influence Matrix Social Graph 4 2 Action Log 𝒕=𝟓
Propagation Graph (a1) P Q R X P P 4 2 Action Log Q R P a1 5 P מבצע פעולה a1 בזמן 5 𝒕=𝟓

P Q R Influence Matrix Social Graph 4 2 Action Log 𝒕=𝟔
Propagation Graph (a1) P Q R X P P 4 2 Action Log Q R P a1 5 R a3 6 R מבצע פעולה a3 בזמן 6 Propagation Graph (a3) R 𝒕=𝟔

P Q R Influence Matrix Social Graph 4 2 Action Log 𝒕=𝟏𝟎
Propagation Graph (a1) P Q R X P 5 P Q 4 2 Action Log Q R P a1 5 Q 10 R a3 6 Q מבצע פעולה a1 בזמן 10 להסביר שהצלעות בגרף הרביה מסומנות במספרים המסמנים את הפרש הזמן בין ביצוע הפעולה של כל משתמש Propagation Graph (a3) R 𝒕=𝟏𝟎

P Q R Influence Matrix Social Graph 4 2 Action Log 11 𝒕=𝟏𝟏
Propagation Graph (a1) P Q R X P 5 P Q 4 2 Action Log 11 Q R P a1 5 Q 10 R a3 6 נוצר קשר חברתי בין Q לR Propagation Graph (a3) R 𝒕=𝟏𝟏

P Q R Influence Matrix Social Graph 4 2 Action Log 11 𝒕=𝟏𝟐
Propagation Graph (a1) P Q R X P 5 P Q 4 2 Propagation Graph (a2) Action Log 11 Q R P a1 5 Q 10 a2 12 R a3 6 Q Q מבצע פעולה a2 בזמן 12 Propagation Graph (a3) R 𝒕=𝟏𝟐

P Q R Influence Matrix Social Graph 4 2 Action Log 11 𝒕=𝟏𝟒
Propagation Graph (a1) P Q R X P 5 P Q 4 2 Propagation Graph (a2) Action Log 11 Q R P a1 5 Q 10 a2 12 R 14 a3 6 2 Q R Propagation Graph (a3) 8 P R 𝒕=𝟏𝟒

P Q R Influence Matrix Social Graph 4 2 Action Log 11 𝒕=𝟏𝟓
Propagation Graph (a1) P Q R X P 5 10 P Q R 4 2 Propagation Graph (a2) Action Log 11 Q R P a1 5 Q 10 R 15 a2 12 14 a3 6 2 Q R כאשר R מבצע את a1 בזמן t=15, לא נוצרת צלע מכוונת מQ לR, משום שQ ביצע את a1 לפני שנוצר קשר חברתי בינו לבין Q. ולכן R לא יכול להיות מושפע מQ. Propagation Graph (a3) 8 P R 𝒕=𝟏𝟓

Propagation Graph (a1) P Q R X 1/3,5 1/3,10 0/0 1/3,2 1/3,8 0,0 P 5 10 P Q R 4 2 Propagation Graph (a2) Action Log 11 Q R P a1 5 Q 10 R 15 a2 12 14 a3 6 2 Q R כעת לוח הפעולות נסרק וניתן לחשב את הפרמטרים הדרושים. Propagation Graph (a3) 8 P R 𝒕=𝟏𝟓

Propagation Graph (a1) P Q R X 1/3,5 1/3,10 0/0 1/3,2 1/3,8 0,0 P 5 10 P Q R 4 2 Propagation Graph (a2) Action Log 11 Q R P a1 5 Q 10 R 15 a2 12 14 a3 6 2 Q R 𝑝 𝑃,𝑄 0 = 𝐴 𝑃2𝑄 𝐴 𝑃|𝑄 = 1 3 כעת לוח הפעולות נסרק וניתן לחשב את הפרמטרים הדרושים. Propagation Graph (a3) 𝜏 𝑃,𝑄 = 𝑎∈𝐴 𝑡 𝑄 𝑎 − 𝑡 𝑃 𝑎 𝐴 𝑃2𝑄 = 5 1 =5 8 P R 𝒕=𝟏𝟓

כדי למצוא את ערכי infl(u) צריך לסרוק מחדש את דוח הפעולות כי צריך את 𝜏 𝑣,𝑢

Evaluating the Models:
For each action, we maintain a 𝑟𝑒𝑠𝑢𝑙𝑡𝑠_𝑡𝑎𝑏𝑙𝑒 with entries of the form <𝑢, 𝑝 𝑢 ,𝑝𝑒𝑟𝑓𝑜𝑟 𝑚 𝑢 > The flag 𝑝𝑒𝑟𝑓𝑜𝑟 𝑚 𝑢 can have one of the following values: 0 – if 𝑢 never performs the action but at least one of its neighbors does 1 – if 𝑢 performs it but at least one of its neighbors has done so before it 2- if 𝑢 is the initiator of the action in its neighborhood After scanning all tuples for an action, 𝑟𝑒𝑠𝑢𝑙𝑡𝑠_𝑡𝑎𝑏𝑙𝑒 contains all users who are active or are neighbors of one or more active users.

The performance is depicted using ROC curves, which plot the TPR against the FPR 𝑝 𝑢 ≥ 𝜃 𝑢 𝐹𝑃 𝑇𝑃𝑅= 𝑇𝑃 𝑇𝑃+𝐹𝑁 𝑝𝑒𝑟𝑓𝑜𝑟 𝑚 𝑢 =0 𝑝 𝑢 < 𝜃 𝑢 𝑇𝑁 𝑝 𝑢 ≥ 𝜃 𝑢 𝑇𝑃 𝐹𝑃𝑅= 𝐹𝑃 𝐹𝑃+𝑇𝑁 ROC – Receiver Operating Characteristics, גרף שבוחן את הביצועים של המודלים השונים. ככל שהגבעה של העקומה קרובה יותר לנקודה (0,1), הביצועים יותר טובים. אנחנו צריכים כמה שפחות FN וFP. מצב מושלם יהיה בלי כאלה בכלל. אחרי עדכון הערכים בresults table נעבור עליה כדי לעדכן לכל ערך בה אם הוא TN FN TP FP. 𝑝𝑒𝑟𝑓𝑜𝑟 𝑚 𝑢 =1 𝑝 𝑢 < 𝜃 𝑢 𝐹𝑁

With Continuous Time Models, we also consider the chronological order in which different neighbors of 𝑢 performed the action. כאן יש חשיבות לזמנים השונים שבהם השכנים של u מבצעים את הפעולה ובכל פעם כזו ההסתברות שהוא יבצע את הפעולה עולה משמעותית ואז מתחילה לדעוך. הדוגמה הבאה מדברת על R, כאשר השכנים לו מבצעים את a4. מכיוון שלR יש d שכנים, לגרף יהיו 2 נקודות מקס' מקומיות.

Predicting Time: We are able not only to predict whether a user performs an action, but also the time interval 𝑏, 𝑒 in which it is most likely to happen: 𝑏 𝑒=𝑏+ 𝜏 𝑣,𝑢 ∗𝑙𝑛2 𝜏 𝑣,𝑢 = =7.5 ln 2 =0.693 𝜏 𝑣,𝑢 ∗ln 2 =5.199 𝑒= ∗0.693 =15.199 הגבול השמאלי של האינטרוול הוא הזמן בוא שכן של u מבצע את הפעולה וההסתברות עולה על סף ההפעלה של u. הגבול הימני הוא b בתוספת של 𝜏 𝑣,𝑢 ∗ln 2 . החיזוי הוא שבטווח הזה, u צפוי לבצע את הפעולה.

Experimental Evaluation:
Out of 6.2 million users and 71 million edges, we took users who are members of at least one group and got a graph of 1.4 million users with 40.5 million edges. The graph has 34,766 connected components, where the largest one consists of million users and edges (99.72%). The rest are ignored. The action log contained a total of million tuples. הפעולה שסביבה אנחנו עובדים היא "הצטרפות לקבוצה"

מהגרפים האלה ניתן לראות שברנולי טיפה יותר טוב מג'קארד ומתוך שתי גרסאות של ברנולי, PC מוביל בממש מעט. בגרף 6 משווים בין כל המודלים השונים וניתן לראות שמודל שמושפע מזמן עובד הרבה יותר טוב מסטטי. CT וDT זהים בביצועים.

גרף 7 נותנים אינטואיציה שככל שהinflunceability יותר גבוה ניתן לחזות השפעה בצורה יותר טובה. גרף a ביחס למשתמשים, גרף b ביחס לפעולות.

הדגש העיקרי שלנו הוא על מודל CT על בסיס ברנולי. מכיוון שDT לא מסוגלים לזהות את האינטרוול שבו המשתמש צפוי לבצע את הפעולה. גרף 11 מראה את הסקלביליות של האלגוריתם שלנו. בזמן training זמן הריצה הוא ליניארי למספר הטאפלים שנקראו. צריכת זיכרון היא קבועה ולא תלויה במספר הטאפלים. המסקנה היא שDT משיג את אותה איכות פיתרון כמו CT רק ביותר יעילות.

Summary: We understood that we scan build models of influence from a social graph and an action log. In addition to predicting whether a user will perform an action, the predictions on users with high influenceability scores had high precision. We were also able to predict the time by which the user will perform the action after its neighbors have done so. We discovered that DT models can be tested much more efficiently and yield accuracy levels very close to those of the CT models. לצמצם פה טיפה במלל

Thank You!

Learning Influence Probabilities In Social Networks

Similar presentations

Presentation on theme: "Learning Influence Probabilities In Social Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Influence Probabilities In Social Networks

Similar presentations

Presentation on theme: "Learning Influence Probabilities In Social Networks"— Presentation transcript:

Similar presentations

About project

Feedback