Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.

Similar presentations


Presentation on theme: "Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th."— Presentation transcript:

1 Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model
Lingzi Hong Feb 10th

2 Research Question problem: modeling activity sequences for large-scale human routine discovery from cellphone censor data fundamental difficulties: do not know the basic units of time for the activities in the question. (hourly,daily?) =>effective modeling of multiple unknown time duration

3 focus on Probabilistic Topic Models
unsupervised=>mining structure of data handle uncertainty extended in various ways to integrate multiple data types=>sensor activity sequences

4 contributions propose the distant n-gram topic model (DNTM) for sequence modeling derive inference process using Markov Chain Monte Carlo (MCMC) sampling apply to two real large-scale datasets comparative analysis with Latent Dirichlet Allocation (LDA)

5 Related Work Topic model as a useful tool
1. T. Huynh, M. Fritz, and B. Schiele. Discovery of activity patterns using topic models. 2. K. Farrahi and D. Gatica-Perez. Probabilistic mining of socio- geographic routines from mobile phone data. 3. T. Bao, H. Cao, E. Chen, J. Tian, and H. Xiong. An unsupervised approach to modeling personalized contexts of mobile users. 4. K. Farrahi and D. Gatica-Perez. Discovering routines from large-scale human locations using probabilistic topic models. Topic model in terms of text 1. LDA. determine probability of each word to each topic and probability of each topic given each document N-gram discovery 1. bigram topic model 2. topic n-gram model

6 Distant N-Gram Topic Model
q m corpus Sm w1,w2,…,wN w = (t, l) t-location l-coordinate of a day The distribution of W1 given topics

7 Distant N-Gram Topic Model
General process: 1. Initialization (document topic, distribution over labels) 2. Sequence generation procedure (estimate paratemeters) model parameters derived based on MCMC approach of Gibbs sampling estimation of parameters: ?

8 Distant N-Gram Topic Model
Anyway there is code that helps to implement this process

9 Experiments and Results
Nokia Smartphone Data Tricks?: days with topic distribution => 10 most probable days for the topic ranked from top to bottom

10 Experiments and Results
MIT Reality Mining Data L={‘H’,’W’,’O’,’N’}, tt=48 most probable days given topics

11 Experiments and Results
most probable sequence components for topics

12 Evaluation splitting into training and testing log-likihood:
A test set is a collection of unseen documents wd, the model is described by the topic matrix Φ, and the hyperparameter α for topic-distribution of documents. log-likihood: The probability of unseen held-out documents given some training documents. Higher likelihood implies a better model Perplexity: The lower perplexity the better the model

13 Evaluation perplexity of the DNTM over number of 20% unseen days
Average log-likelihood of the DNTM versus LDA on 20% unseen days.

14 Discussion generalization of the model
model assumes every topic has a distribution of sequence q, with element w labeled with time and location, which means w involves with a general topic distribution. But if there is a lot of user samples, a workplace for A might be leisure place for B. For topic models, if one word involves with a topic distribution, this distribution will be equally applied to all documents. However we can’t assume a place has the same topic distribution of day activities for different people. Could we? Nokia Smartphone: 2 users and each with a lot of places in two different cities. Few overlapping places with mixed function. Result is separately for user1 and user2. MIT data: lots of users but places have been labeled. So result is only identification of topics. Real data set will include a lot of users and not labeled places.

15 Discussion How to choose N? Segmentation of sequences according to activities or according to time? What if the last sequence q is not complete?

16 Discussion Could we just make clustering of the sequences to detect activity patterns? 48 intervals a day, each interval as a feature, value of the feature is the label (‘H’,’W’,’O’,’N’)


Download ppt "Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th."

Similar presentations


Ads by Google