Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

Similar presentations


Presentation on theme: "Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1"— Presentation transcript:

1 Fused Matrix Factorization with Geographical and Social Influence in Location-based Social Networks
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1 1Department of Computer Science and Engineering The Chinese University of Hong Kong & 2ATT Labs, Research AAAI 2012, Toronto, Canada

2 Check-in becomes a life style…
In recent years, Location-based Social Networks such as Fourquare, Gowalla, Facebook place have attracted millions of users. We can easily share our experiences about locations with our friends just through the apps on our mobile phone. For example, this figure shows that I’ve just checked in at the engineering building through foursquare apps. AAAI 2012, Toronto, Canada

3 Check-in becomes a life style…
Now the number of users surpasses 20 million corresponding to 2 billion check-ins1! This figure shows the growth trend of of foursquare users by Jan It grows very fast in the past few years. Now the number surpasses 20 million corresponding to 2 billion check-ins. This is quite a large number. 1http://statspotting.com/2012/04/foursquare-statistics-20-million-users-2-billion-check-ins/ AAAI 2012, Toronto, Canada

4 Graph illustration of Location-based Social Networks (LBSNs)
Checked in POI ( lat, lng ) Friend link Check in ? Community detection Link prediction POI recommendation Next place prediction This is the graph illustration of LBSNs. In LBSNs, we have millions of users and POIs. We can obtain social information between users, and each POI we have its latitude and longitude information, from which we can calculate the distance between two POIs. The connection between users and POIs is through check in . An interesting problem we want to focus on is that, given a new place, will the user be interested in this POI? Can we provide accurate POI recommendation for users in LBSNs? Travel sequence detection Trip recommendation AAAI 2012, Toronto, Canada

5 Our focus: POI recommendation
Help users explore their surroundings Provide personalized travel recommendation Help 3rd-party developers provide personalized services Advertisements Coupons Traffic statistics POI recommendation is a very significant task. First, it can help users explore new places and know their city better. For example, if we want to find somewhere new to eat, it would certainly help. And foursquare has already offered such kind of services. Second, it can also help 3rd-party developers to provide personalized services. For example, if we know a user would like to check-in a restaurant, the advertiser can provide the restaurant advertisement for the user. AAAI 2012, Toronto, Canada

6 Challenges Large dataset Only positive data is seen
Crawled from Gowalla from Feb to Sep. 2011 4,128,714 check-ins from 53,944 users on 367,149 locations Only positive data is seen Sparsity : density of our dataset is only % There are several challenges for POI recommendation in LBSNs. First the dataset is very large. Recall that there are millions of users in LBSNs. Second only positive data is seen. We can infer a user like a location from his check-ins, however, we donot the locations which he dislikes. Third, the dataset is very sparse which makes POI recommendation very tough. AAAI 2012, Toronto, Canada

7 POI recommendation in LBSNs
Matrix Factorization can be a promising tool However… Geographical influence is ignored! A promising tool is matrix factorization due to its success in traditional recommender systems. However, geographical influence is ignored. AAAI 2012, Toronto, Canada

8 POI recommendation in LBSNs
Er… a little far.. For example, the foursquare recommend me this restaurants, however, sometimes we would like to choose a nearby place due to the distance. AAAI 2012, Toronto, Canada

9 Multi-centers and normal distribution
We further explore user’s check-in behavior and find that users tend to check in around several centers. Different from Cho 2011, they assume there are only 2 centers, home and office, however, we found that other centers count at least 10% of all the check-ins. These centers can be braches of large companies or airports. Two centers (home & office) in [Cho et al 2011] Several centers proposed in our paper AAAI 2012, Toronto, Canada

10 Multi-centers and normal distribution
Similar to [Brockmann 2006, Gonzalez 2008] , we assume each center follow the norm distribution Many previous papers have used normal distributions to model human movement around a particular point, and we adopt this and assume each center follow the norm distribution. AAAI 2012, Toronto, Canada

11 Inverse distance rule We also plot the relationship between check-in probability between the distance to the nearest center, and find that although each user has his personalized taste for locations, the probability he will visit a location is inversely proportional to the distance between the location and its nearest center. AAAI 2012, Toronto, Canada

12 Social influence On average, overlap of a user’s check-ins to his friends only about 9.6% 90% users have only 20% common check-ins On average, the overlap of a user’s check-ins to his friends is only about 9.6%, and we plot the CCDF of the fraction of a user’s check-ins that are also visited by his friends, and find that almost 90% users only have 20% check-ins in common with their friends, which indicates limited social influence in POI recommendation in LBSNs, which is illustrated in our experiments. AAAI 2012, Toronto, Canada

13 Our proposal Multi-center Gaussian Model (MGM) to capture geographical influence Propose a generalized fused matrix factorization framework to include social and geographical influences Conduct thorough experiments on large-scale Gowalla dataset Based on the above observations, we first proposed a Multi-center Gaussian Model to capture geographical influence. Next we proposed a generalized fused matrix factorization framework including social and geographical influence. Finally, we conduct thorough experiments conducted on large-scale Gowalla dataset AAAI 2012, Toronto, Canada

14 Multi-center Gaussian model
Recall check-in locations are located around several centers The probability a user visiting a location is inversely proportional to the distance from its nearest center MGM is proposed to model users’ check-in behavior Recall that check-in locations are located around several centers, and the probability a user visiting a location is inversely proportional to the distance from its nearest center, MGM is proposed to model users’ check-in behavior. AAAI 2012, Toronto, Canada

15 Multi-center Gaussian model
Notation : multi-center set for user u : total frequency at center for user u is : the pdf of Gaussian distribution, and denote the mean and covariance matrices of regions around center The probability a user u visiting a location l given defined as: Here is the notation list. C_u is the multi-center set, and fcu is the frequency at a certain center, and this is the pdf of Gaussian distribution. Mu and sigma are the mean and covariance respectively. This first term denotes the probability l belongs to a certain center, the second term is norm effect of check-in freq at a certain center cu, for a center with large total frequency such as home , the probability should be higher than other centers. The third term is the normalized probability of l will be checked-in by the user. The whole term denotes the probability the l belongs to center cu and also visited by the user. And we sum up the probability at all centers, we get the probablity the user u will visit l. AAAI 2012, Toronto, Canada

16 Multi-center discovering algorithm
A greedy clustering algorithm is proposed due to Pareto principle (top 20 locations cover about 80% check-ins) 0.2 Next we just need to find the centers for each user. We proposed a greedy clustering algorithm for it. We find that top 20 locations cover about 80% check-ins as known as Pareto principle. We first rank all locations according the frequency. Then we scan to find centers. If the location doesnot belong to any centers, and search locations within d km to it and also not added to other centers to form a new center, and if the total freq is larger than a thrshold, a new center region is formed. 20 search centers AAAI 2012, Toronto, Canada

17 encode user preference
Fused framework Traditional Matrix Factorization (MF) only model users’ preference on locations MGM only models geographical influence We can fuse both of them prob. user u visit location l Traditional MF only models users’ preference on locations, we denotes as P(Ful), and our proposed MGM only models geographical influce, actually, the probablity a user will visit a location controlled by his personalized taste for it as well as the geographical constraints that whether it is close to his centers. So can fused them together to get the fused framework. encode user preference based on MF calculated by MGM AAAI 2012, Toronto, Canada

18 Setup and metric Split the dataset into 2 non-overlapping sets
Randomly select x% for each user as training data and the rest (1-x)% as the test data, x set to 70 and 80 Carried out 5 times independently, we report the average POI recommendation Return top-N POIs for each user Find out # of locations in test dataset are recovered Metric We split the dataset into training and test data set, and carried out 5 time independently and we use the traditional precision and recall metric. AAAI 2012, Toronto, Canada

19 Comparison Methods MGM PMF: [Salakhutdinov and Mnih 2007]
Assume Gaussian distribution on observed data Gaussian prior on latent feature vector PMF with Social Regularization (PMFSR): [Ma et al. 2011b] Social regularization term added to PMF Probabilistic Factor Model (PFM): [Ma et al. 2011a] Model frequency data, Gamma prior on latent feature vector and Poisson distribution on the frequency data Fused MF with MGM (FMFMGM): our proposed method Here is the list of the comparison methods. Next three are well-known matrix factorization methods we introduced before. And the last is our fused method. AAAI 2012, Toronto, Canada

20 Results 70% 80% Precision Recall
Here is the comparison result. From the figure we can see that, MGM and our fused framework consistently outperforms other MF methods without considering geographical influence, which indicates that GI plays a significant role in POI recommendation. Second, our fused framework performs at least 50% better than MGM, which also verified that the probability a user visit a location is controlled by user preference and GI. Last, when PMFSR only performs a little better than PMF, which coincides the conclusion that social influence is limited. 80% AAAI 2012, Toronto, Canada

21 User check-in distribution
One challenge for POI recommendation is that it is difficult to provide recommendation for users with very few check-ins. In order to compare our methods thoroughly with others, we group the users into 6 classes according to their number of check-in locations in the training dataset. This figure shows the distributions on different range of check-in locations. AAAI 2012, Toronto, Canada

22 Performance on different users
This two figures show the results. We can see that we AAAI 2012, Toronto, Canada

23 Conclusion Extract characteristics of a large dataset crawled from Gowalla Propose a novel Multi-center Gaussian Model (MGM) to model geographical influence Propose a fused MF framework which outperforms state-of-the-art methods AAAI 2012, Toronto, Canada

24 Future work To better model one-class frequency data
To include other information: location category, activity, etc. To incorporate temporal effect AAAI 2012, Toronto, Canada

25 Thanks Q&A Chen Cheng ccheng@cse.cuhk.edu.hk
AAAI 2012, Toronto, Canada


Download ppt "Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1"

Similar presentations


Ads by Google