Presentation is loading. Please wait.

Presentation is loading. Please wait.

Location Mining from Online Social Networks Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham.

Similar presentations


Presentation on theme: "Location Mining from Online Social Networks Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham."— Presentation transcript:

1 Location Mining from Online Social Networks Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham

2 Location Mining in Online Social Networks What is the city level home location of a user?

3 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

4 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

5 Why is Location Important? Privacy and Security Trustworthiness Location Driven Mining for Business Location-Based Social Networking to generate US $21.14 billion by But only ~14.3% provide it explicitly 2 1 According to New Report by Global Industry Analysts, Inc., (GIA) (http://www.strategyR.com/) 2 According to an experiment performed by us on 1 million users

6 Twitter - Basics Tweets: Maximum 140 Characters # of Tweets # of Following # of Followers Location

7 Why is location so important?

8 Privacy and Security Losing locational privacy forever  Users leave field blank, don’t want strangers to know their locations

9 Trustworthiness Corporate companies use social media for better advertising and marketing Iran Elections of 2009 –US State Department used Twitter as a source Trustworthiness is important in such cases To be able to trust/verify the correctness of location mentioned in user profile

10 Marketing and Business Large corporations Walmart, Starbucks, United Airlines use social media  Great tool for inexpensive advertising  Getting feedback from users

11 The Problem Leave the location field blank in their Twitter profiles Do not provide valid geographic information “Justin Biebers heart”, “NON YA BISNESS!!”, “looking down on u people” Provide incorrect locations which may actually exist in real world “Nothing” in Arizona, “Little Heaven” in Connecticut Provide several locations, difficult to identify the home location “CALi b0Y $TuCC iN V3Ga$” – California boy stuck in Las Vegas, NV (~35%) enter just country, state, county, etc. and no city level locations 1 1.B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11.

12 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

13 Location Prediction in Social Networks Two Approaches –Content Based 1,2 –Using Social Graph 3,4,5 1.Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10. 2.B. Hecht, L. Hong, B. Suh, E. H. Chi, “Tweets from justin biebers heart: the dynamics of the location field in user profiles”, In SIGCHI ’11. 3.S. Abrol, L. Khan and B. Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA. 4.S. Abrol., L. Khan and B. Thuraisingham “Tweecalization: Efficient and intelligent location mining in Twitter using semi-supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012 Pittsburgh, Pennsylvania. 5.S. Abrol., L. Khan, “Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining,” The Second IEEE International Conference on Social Computing (SocialCom2010), Aug 20-22, 2010 Minneapolis, Minnesota.

14 Content Based Approach Inaccurate – Location in Text not Location of User Involves Ambiguity: Paris can mean –Paris Hilton –Paris, the capital of France –Paris, a town in Texas Slow – Uses NLP/ Machine Learning techniques, searches gazetteers

15 Using Social Graphs Based on Japanese Proverb - “When the character of a man is not clear to you, look at his friends.” Relationship between geospatial proximity and friendship Uses classical data mining algorithms for more accurate results Faster and can be used for real world applications

16 Geospatial Proximity and Friendship Form Twitter user pairs and identify geo distance Curve follows power law, curve of form a(x+b) -c with exponent of -0.87

17 Graph Construction Vertices (data points) represents users Edge represents ‘similarity’ between two users Deal with special cases Spammers – follow random people Celebrities – followed by random people Edge weight gets abbreviated

18 Defining Edge Weight Consists of two components: –Trustworthiness (TW) –Mutual Friends (MF)

19 Trustworthiness Fraction of friends which have the same label as the user himself Intuition: A person who has stayed at the same place all his life will have most friends from same location and hence high trustworthiness A B CDEFGHIJ Location : Seattle/WA/USA Friend Trustworthiness: 0.6 Location:Seattle/WA/USA

20 Mutual Friends Chose number common friends for similarity –Better Accuracy –Low Time Complexity

21 Defined as Weight ij =α×Max{TW(U i ), TW(U j )} + (1- α) × MF ij 0<α<1, typically chosen to be around 0.7 Defining Edge Weight

22 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

23 Tweethood: Fuzzy k-Closest Friends with Variable Depth Choose k “closest” friends for the user If location is not found look further for the answer Each node is defined by a vector having locations with their respective probabilities Boost and Aggregate at each step Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010

24 Agglomerative Clustering Don’t want to find just any location Want a location or group of locations with some confidence Tradeoff between number of locations, distance between concepts, and total confidence Construct matrix at each step with Objective Function of the above attributes. Choose concepts with maximum values Continue till we cross threshold

25 Find the location of John Doe John Doe

26 Social Network of John Doe Friend 1Friend 2Friend 3Friend n C B1 C B2 C B3 CBnCBn

27 Choose k closest friends of John Doe Friend 1Friend 2Friend 3Friend k C B1 C B2 C B3 CBkCBk

28 Identify Locations Friend 1Friend 2Friend 3Friend k C B1 C B2 C B3 CBkCBk Location : NULL Location : Seattle, USA LOW ACCURACY

29 What if we have depth=2 ? A B CDEFGHIJ Friend 1Friend 2Friend 3Friend k C B1 C B2 C B3 C Bk Location : Seattle/WA/USA Location : NULL Location : Sydney/AU Location : Dallas/TX/USA Location : Richardson/TX/USA Location : NULL Location : Dallas/TX/USA Location : NULL

30 C B1 C B2 C B3 CBkCBk Friend 1Friend 2Friend 3Friend k Dallas/TX/USA0.4 Seattle/WA/USA0.2 Richardson/TX/USA0.2 Sydney/AU0.2 Dallas/TX/USA0.33 New Delhi/Delhi/India0.33 Sunnyvale/CA/USA0.33 Austin/TX/USA0.50 Minneapolis/MN/USA0.50 Plano/TX/USA0.25 Boulder/CO/USA0.25 Salt Lake City/UT/USA0.25 London/London/GB0.25 Location Vector for John Doe’s friends

31 Location Vector for John Doe John Doe Dallas/TX/USA Seattle/WA/USA0.05 Richardson/TX/USA0.05 Sydney/AU0.05 New Delhi/Delhi/IN Sunnyvale/CA/USA Austin/TX/USA0.125 Minneapolis/MN/USA0.125 Plano/TX/USA Boulder/CO/USA Salt Lake City/UT/US London/GB0.0625

32 Agglomerative Clustering John Doe Dallas/TX/USA Seattle/WA/USA0.05 Richardson/TX/USA0.05 Sydney/AU0.05 New Delhi/Delhi/IN Sunnyvale/CA/USA Austin/TX/USA0.125 Minneapolis/MN/USA0.125 Plano/TX/USA Boulder/CO/USA Salt Lake City/UT/US London/GB0.0625

33 John Doe {Dallas, Plano, Richardson}/TX/USA Seattle/WA/USA0.05 Sydney/AU0.05 New Delhi/Delhi/IN Sunnyvale/CA/USA Austin/TX/USA0.125 Minneapolis/MN/USA0.125 Boulder/CO/USA Salt Lake City/UT/US London/GB Agglomerative Clustering

34 Tweethood: Algorithm

35 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

36 Tweecalization: Label Propagation But the availability of users with location is limited Most of users do not have a location Need a method that can learn from unlabeled data Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania

37 Tweecalization: Label Propagation Ideal scenario for semi supervised learning: Only a few friends with locations(labeled data) 1 Use both labeled and unlabeled data for training Points which are close to each other are more likely to share a label 1.Y. Bengio, O. Dellalleau, and N. L. Roux, “Label propagation and quadratic criterion,” In O. Chapelle, B. Schlkopf and A. Zien (Eds.), Semi-supervised learning. MIT Press, 2006.

38 Label Propagation: An Illustration ? Central User Friends with location Friends without location “CLAMPED LOCATIONS”

39 Tweecalization: Algorithm

40 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

41 What About Temporal Analysis? None of the existing works do temporal analysis What about migration/ geographical mobility?

42 Migration/Geographical Mobility 4% to 6% every year, means 12 to 17 million each year 1.United States Census Bureau - Geographical Mobility/Migration Data -

43 Migration/Geographical Mobility Migration as a function of age People aged have a higher probability to move High Migration Rate: College and Jobs Low Migration Rate: Old age, people settle down 1.United States Census Bureau - Geographical Mobility/Migration Data -

44 Facebook Users and Mobility Let us look at the cumulative effect Only 28% to 37% are currently living in their hometown 1.Based on our experiments on 300k Public Facebook Profiles

45 Twitter Users and Mobility Linking Twitter users to migration 33% of all Twitter users are aged years 1.Based on our findings by [1]ABI Research. Online. Available:

46 Tweeque: Graph Partitioning How do we know if “this” is the current location for a user? How do we perform temporal analysis of friendships? Propose a technique that indirectly infers the current location Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA.

47 Observation 1: Social Cliques and Location Our definition: A social clique is an inclusive group of people that share friendship Apart from friendship, what is the attribute that links members of a clique? Individual Locations All members of a clique were or are at a particular geographical location at a particular instant of time like college, school, a company, etc.

48 As shown previously over course of time, people have tendency to migrate Based on these two observations we hypothesize If we can divide the social graph of a particular user into cliques and check for location based purity of the cliques, we can accurately separate out his current location from previous locations. Migration is our latent time factor Observation 2: Migration and Time

49 Tweeque: An example Friend 1 Friend 2 Friend 3 Friend 4 Friend 5 Friend 6 Friend 7 Friend n Friends from high school in Dallas Friends from college in Boston Relatives/Cousins Friends from job in Seattle

50 Tweeque: An example All Friends of the User

51 Tweeque: An example Social Clique #1 (High School) Social Clique #2 (College) Social Clique #3 (Current Work) Social Clique #4 (Relatives)

52 Tweeque: An Example Dallas/TX/USA Seattle/WA/USA Dallas/TX/USA San Diego/CA/USA New York/NY/USA Boston/MA/USA Portland/OR/USA Austin/TX/USA Boston/MA/USA Dallas/TX/USA Singapore Sydney/Australia Dallas/TX/USA Ontario/Canada Seattle/WA/USA Dallas/TX/USA Seattle/WA/USA Redmond/WA/USA High SchoolCollege Relatives Work Purity (Dallas) = 0.32 Purity (Boston) = 0.45 Purity (Dallas) = 0.18 Purity (Seattle) = 0.69

53 Tweeque: Graph Partitioning

54 Tweeque: Graph Partitioning J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp , Aug

55 Tweeque: Graph Partitioning

56 Tweeque: Algorithm

57 Tweeque: Purity Voting

58 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

59 Experiment Data Randomly choose 1000 Twitter users

60 Experiments and Results 75.5% for city level prediction 80.1% for country level prediction We observe that the accuracy saturates after depth 4 Six degrees of separation is the idea that everyone is on average approximately six steps away, by way of introduction, from any other person in the world` For Twitter this distance is found to be 4.67

61 Comparison of Different Approaches Tweethood 1 Tweecalization 2 Tweeque 3 Content Based 4 Accuracy (City)72.1%75.5%76.3%35.6% - 51% Accuracy (Country) 80.1% 84.9%52.3% ComplexityO(n)O(n 3 ) N/A Temporal Analysis No Yes 1.Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%) 2.Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania 3.Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA. 4.Z. Cheng, J. Caverlee, and K. Lee, “You are where you tweet: A content-based approach to geo-locating twitter users”. In CIKM ’10.

62 Outline Introduction and Problem Statement Different Approaches Social Graph Based: Our Approaches Tweethood: Fuzzy k – Closest Friends with Variable Depth Tweecalization: Label Propagation Tweeque: Graph Partitioning for Spatio-Temporal Analysis Experiments and Results Future Work

63 Contributions Developed three graph based location mining algorithms for online social networks  Maps location mining problem to k-nearest neighbor, semi supervised and graph partitioning problem  Outperform content based approach in time and accuracy Relationship between geospatial proximity and friendship Effect of geographical mobility on current location of users

64 Future Work Combining Content and Graph based methods  Score based geo-tagging technique 1  Associating keywords with locations to build probabilistic model: “cowboys”  Dallas, “casino”  Las Vegas  Since tweets have timestamps, it leads to more accurate prediction of current location 1 Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle.

65 Future Work Improve scalability of current algorithms using cloud computing framework  Each of the friends of a user is handled by a separate node in the distributed environment Micro-level location identification  Identify specific points of interests (POIs) such as restaurants, place of work, etc from tweets  Identify comfort zone for a user  Use Foursquare check-in dataset: over 30 million POIs all over the world

66 Publications Satyen Abrol, Latifur Khan and Bhavani Thuraisingham,“Tweeque: Spatio-Temporal Analysis of Social Networks for Location Mining Using Graph Partitioning,” The First ASE/IEEE International Conference on Social Informatics, December 14-16, 2012, Washington D.C., USA. Satyen Abrol, Latifur Khan and Bhavani Thuraisingham, “Tweecalization: Efficient and Intelligent location mining in Twitter using semi- supervised learning,” 8th IEEE International Conference on Collaborative Computing, October 14–17, 2012, Pittsburgh, Pennsylvania Satyen Abrol, Latifur Khan, “TweetHood: Agglomerative Clustering on Fuzzy k-Closest Friends with Variable Depth for Location Mining”. In Proc. of the Second IEEE International Conference on Social Computing (SocialCom-2010), Minneapolis, USA, August 20-22, 2010 (Nominated for best paper award, Acceptance Rate:13%)

67 Publications Satyen Abrol And Latifur Khan, “TWinner: Understanding News Queries With Geo-Content Using Twitter”. In Proc. of 6th Workshop on Geographic Information Retrieval (GIR'10) At Zurich, Switzerland. Satyen Abrol, Latifur Khan, Tahseen Al-khateeb, “MapIt: Smarter Searches using Location Driven Knowledge Discovery and Mining”, In Proc. of 1st SIGSPATIAL ACM GIS 2009 International Workshop on Querying and Mining Uncertain Spatio-Temporal Data (QUeST), Nov 2009, Seattle. Satyen Abrol, Latifur Khan, Vaibhav Khadilkar, Bhavani M. Thuraisingham, Tyrone Cadenhead, “Design and implementation of SNODSOC: Novel class detection for social network analysis”, ISI 2012:

68 Thank You! Questions?


Download ppt "Location Mining from Online Social Networks Satyen Abrol Advisors: Dr. Latifur Khan Dr. Bhavani Thuraisingham."

Similar presentations


Ads by Google