Understanding Human Mobility from Twitter

Slides:



Advertisements
Similar presentations
By Venkata Sai Pulluri ( ) Narendra Muppavarapu ( )
Advertisements

1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair),
Sampling Distributions (§ )
Empirical Investigations of WWW Surfing Paths Jim Pitkow User Interface Research Xerox Palo Alto Research Center.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
Continuous Probability Distributions
Alain Bertaud Urbanist Module 2: Spatial Analysis and Urban Land Planning The Spatial Structure of Cities: International Examples of the Interaction of.
6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Spatial Statistics Applied to point data.
Y. Yuan and M. Raubal1 Investigating the distribution of human activity space from mobile phone usage Yihong Yuan 1,2 and Martin Raubal 1 1 Institute.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Chapter 15 – CTRW Continuous Time Random Walks. Random Walks So far we have been looking at random walks with the following Langevin equation  is a.
Roghayeh parsaee  These approaches assume that the study sample arises from a homogeneous population  focus is on relationships among variables 
SUBDIFFUSION OF BEAMS THROUGH INTERPLANETARY AND INTERSTELLAR MEDIA Aleksander Stanislavsky Institute of Radio Astronomy, 4 Chervonopraporna St., Kharkov.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Kobe Boussauw – 15/12/2011 – Spatial Planning in Flanders: political challenges and social opportunities – Leuven Spatial proximity and distance travelled:
Migration Modelling using Global Population Projections
Public Health February 2017
Induced Travel: Definition, Forecasting Process, and A Case Study in the Metropolitan Washington Region A Briefing Paper for the National Capital Region.
Data analysis is one of the first steps toward determining whether an observed pattern has validity. Data analysis also helps distinguish among multiple.
Mobility Trajectory Mining Human Mobility Modeling at Metropolitan Scales Sibren Isaacman 2012 Mobisys Jie Feng 2016 THU FIBLab.
SUR-2250 Error Theory.
Data Quality Data Exploration
Manuel Gomez Rodriguez
How may bike-sharing choice be affected by air pollution
An analytical framework to nowcast well-being using Big Data
Practice & Communication of Science From Distributions to Confidence
Carina Omoeva, FHI 360 Wael Moussa, FHI 360
Analysis of Economic Data
Spatial analysis Measurements - Points: centroid, clustering, density
A tale of many cities: universal patterns in human urban mobility
AMPO Annual Conference October 22, 2014
Marron Institute of Urban Management,
Parameter Estimation 主講人:虞台文.
Summary of Prev. Lecture
PCB 3043L - General Ecology Data Analysis.
A Statistical and GIS Approach to Analyzing a Museum’s Customer Base
Probablity Density Functions
Summary Presented by : Aishwarya Deep Shukla
Numerical Measures: Centrality and Variability
Macroscopic Speed Characteristics
From Distributions to Confidence
The Practice of Statistics in the Life Sciences Fourth Edition
Outline Parameter estimation – continued Non-parametric methods.
Bayesian Models in Machine Learning
Data Collection and Sampling
Twitter as a novel source of mobility indicators
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Sibren Isaacman Loyola University Maryland
The free-energy principle: a rough guide to the brain? K Friston
Sampling Distribution of a Sample Proportion
2009 Minnesota MPO Conference August 11, 2009
Norman Washington Garrick CE 2710 Spring 2016 Lecture 07
University of Washington, Autumn 2018
Probabilistic Latent Preference Analysis
Data Quality Data Exploration
Volume 20, Issue 3, Pages (February 2010)
Information Processing by Neuronal Populations Chapter 5 Measuring distributed properties of neural representations beyond the decoding of local variables:
Sampling Distributions (§ )
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Timescales of Inference in Visual Adaptation
Volume 32, Issue 1, Pages (October 2001)
Topological Signatures For Fast Mobility Analysis
Fall Final Topics by “Notecard”.
Population Density & Distribution
Probabilistic Surrogate Models
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Understanding Human Mobility from Twitter Raja Jurdak, Kun Zhao, Jiajun Liu, Maurice Abou Jaoude, Mark Cameron, David Newth PLoS ONE 10(7): e0131469. doi:10.1371/journal.pone.0131469 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0131469

Understanding Human Mobility Significance Urban planning Transport Disease spread Issues with current data sources Census: coarse-grained GPS/Call data records/Wifi: proprietary/private Geo-tagged tweets as a proxy 500 Million users 340 Million tweets/day Up to 10m accuracy

Open Issues with Geo-Tagged Tweets as Mobility Proxy Potential sampling bias Demographic Geographic Content limit on tweets 140 characters/tweet Unknown effect of this limit on tweet locations Tweet location preference Unclear how it can affect observed movement patterns

Overview of this work Determine how representative are Twitter-based mobility patterns of population and individual-level movement Analyse a large dataset with 7,811,004 tweets from 156,607 Twitter users Compare the mobility patterns observed through Twitter with the patterns observed through other technologies, such as call data records

Displacement Distribution Displacement distribution, namely spatial dispersal kernel P(d), where d is the distance between a user’s two consecutive reported locations. Mixed exponential-stretched exponential fitting Previously observed distributions: Power-law (banknotes) Truncated power-law (mobile phones & travel surveys) Exponential and log-normal (GPS from cars/taxis) Multimodal nature: (a) Mixture of exponential and stretched exponential; (b) double power law

What could be different here? May stem from multiplicative processes, i.e. the displacement d is determined by the product of k random variables These random variables can be transportation cost, lifestyle aspects such as the preference on commute distance, or socio-economic status such as personal income. The number of these variables k, namely the number of levels in the multiplicative cascade, is indicated by the exponent β in the above equation. When k is small, P(d) converges to a stretched-exponential asymptotically, and k → +∞ leads to the classic log-normal distribution. In particular, if these random variables are Gaussian distributed, we have k ≈ 2/β, and the value of k is around 3 or 4 for our data (β ≈ 0.55).

Another fitting possibility 2 separate power laws Differences between short and long distance travel patterns

Radius of Gyration Quantifies the spatial stretch of an individual trajectory or the traveling scale of an individual where is the individual’s i-th location, is the geometric center of the trajectory and n is the number of locations in the trajectory.

Rg Distribution indicates that there is strong individual heterogeneity of traveling scale over the whole population. In other words, there may be geographical constraints that shape the movement depending on how far people move from their home location. Indeed, it has been suggested that the distribution of rg should be asymptotically equivalent to P(d), if each individual trajectory is formed by displacements randomly drawn from P(d) [32]. The general shape of the rg distribution looks similar to some previously reported rg distributions in other regions such as Santo Domingo in the Dominican Republic [33], suggesting highly regional specific dynamics at play, such as population density and urban layout. However, our geotagged Twitter data provides position resolution at up to 10m, compared to typical resolutions of 1km in previous studies [7, 32], allowing more fine-grained validation of these dynamics.

First passage time Fpt(t), i.e. the probability of finding a user at the same location after a period of t Similar to CDR trends Periodic behavior with daily cycles Content limit does not appear to affect Fpt

Preferential return to visited locations The probability function P(L) of finding an individual at his/her L-th most visited location. Sort visited locations and perform spatial clustering (250m) Generally follow Zipf law of preferential return People are 50% likely to tweet from most popular location, higher than other data sources

Predictability of Tweet Locations Study the randomness (entropy) and predictability of the sequence of tweeting locations for each user Bi-modal distribution for users with >20 locations suggests 2 types of users Entropy Predictability

Probabilistic Modeling of Human Movement Rg: 1-10 km Rg: 10-100 km Rg: 100-500 km Rg: 500-1000 km

Isotropy of Motion Patterns Isotropy ratio σ = δy/δx, where δy is the standard deviation of P(x, y) along the y-axis and δx is the standard deviation of P(x, y) along the x-axis, to characterise the orbit of each rg group Second peak at ~1000 km differentiates results from previous studies

Preferential return decreases with larger orbits Likelihood to tweet from home location drops with increasing rg Exponent α also decreases with increasing rg

Twitter-based Mobility Patterns Rg: 1-10 km Rg: 10-100 km Rg: 100-500 km Rg: 500-1000 km

Discussion Three observed modes of mobility Intrasite Metropolitan Intercity Two apparent groups of tweeters Highly predictable group where geo-tags are not highly useful for mobility prediction Less predictable group where geo-tagged tweets can be representative of movement pattterns

…Discussion Long distance movers more diffusive in their movement than intermediate distance movers, most likely as a reflection of a switch in transportation mode towards air travel and local circulation around destination cities. Preferential return strongly dependent on a person’s orbit of movement, with long distance movers less likely to return to previously visited locations. Population-level mobility patterns are well-represented by geo-tagged tweets, while individual-level patterns are more sensitive to contextual factors

Implications and Future Work Develop agent-based model for disease spread based on rg group features Use tweets to better understand drivers for movement Use location inference algorithms on tweet content to increase data sample with higher uncertainty per location