Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1.

Similar presentations


Presentation on theme: "Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1."— Presentation transcript:

1 Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

2 Example: Holiday Inn TWITTERFACEBOOK 2

3 Motivation: Individuals Want to find profiles, but no one place has them Sometimes on company websites, but: No standardized location Not all companies bother 3

4 4

5 5

6 Motivation: Organizations Track competitor’s use of social media Find imposter profiles 6

7 Problem Definition 7 System Social Profiles Organization Name Official Affiliate Unrelated

8 Related Work Focused on deduplication for individuals Relevant: profile characteristics focused on 8

9 Related Work: Usernames Connecting Corresponding Identities across Communities (Zafarani & Liu, 2009) Connecting users across social media sites: a behavioral- modeling approach (Zafarani & Liu, 2013) Studying User Footprints in Different Online Social Networks (Malhotra et al., 2012) 9

10 Related Work: Created Content Identifying Users Across Social Tagging Systems (Iofciu, Fankhauser, Abel & Bischoff, 2011) 10

11 Methodology: System Design 1.Input: organization’s name (query) 2.Search Facebook/Twitter APIs, retrieve profiles 3.Convert profiles into feature vectors 4.Classify profile-as-vectors 11

12 Classifier Choice Evaluated scikit-learn’s: Decision Tree Naïve Bayes Support Vector Logistic Regression Random Forest Features aren’t independent – trees are well-suited 12

13 Feature Breakdown: Name-based Normalized Edit Distance Query to Username Query to Display Name Edit Distance Query to Username Query to Display Name Length of Query Length of Username Length of Display Name 13

14 Feature Breakdown: Name-based Quirks Need to handle abbreviations, stopwords Citigroup versus Citi, General Motors versus GM Take two edit distances: original string, processed string Use better scoring of the two 14

15 Feature Breakdown: Description Occurrences of Query Cosine Similarity Query and Description Duckduckgo Description and Profile Description 15

16 Feature Breakdown: Language Models Construct Bigram Language Model for: Official profile descriptions Affiliate profile descriptions Unrelated profile descriptions Probability that candidate description belongs to each 16

17 Evaluation: Ground Truth Creation 17 1.Retrieved organizations from Freebase 2.Searched for profiles on Twitter/Facebook 3.Manually labelled as official/affiliate/unrelated

18 Evaluation: Ground Truth Breakdown TWITTER CLASSESFACEBOOK CLASSES 18 3381 labels3413 labels

19 Evaluation: Process Mainly concerned with official and affiliate classes Not interested in unrelated class Modified 10-fold Cross Validation 19

20 Evaluation: Modified Cross Validation 1.Generate folds as per normal 2.Train classifier on training set as per normal 3.For each affiliate/official profile in test set: 1. Input organization’s name to system 2. Count number of correct results 4.Calculate precision/recall/F1 from counts 20

21 Evaluation: Baseline Normalised Edit Distance: Username/Display Name and Query Emulates searching networks manually without examining profile in detail 21

22 Results & Discussion: Twitter 22

23 Results & Discussion: Facebook 23

24 Discussion Baseline performs well for official class on Facebook Username and display name alone are good indicators for this class Other features still help, but not as much 24

25 Discussion: Facebook Characteristics Many profile types: people, pages, places, etc. Finding official pages is simplified But: finding affiliates requires more effort 25

26 Discussion: Facebook Characteristics Facebook doesn’t require a “username” be specified for pages Will just use an ID instead Auto-generated pages also only have IDs, use name from Wikipedia/other sources 26

27 Limitations Ground truth proportions: expand and/or balance 27

28 Limitations Ground truth proportions: expand and/or balance Limited number of profiles retrieved for classification 28

29 Future Work Support additional networks Examine post content “Preferential” classification 29


Download ppt "Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1."

Similar presentations


Ads by Google