Presentation on theme: "1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair),"— Presentation transcript:
1 Using GIS to Understand Behavior Patterns of Twitter Users Yue Li M.S. Civil/Geomatics Engineering Purdue University Committee: Dr.Jie Shan (Chair), Dr.Nicole Kong, Dr.James Bethel
2 Introduction Volunteered Geographic Information (VGI) 1 −Emergency management, event detection, tourist behavior, knowledge discovery… Twitter −The most popular micro-blogging site −Tweets with longitude and latitude −A gold mine for scholars in geography, linguistics, sociology, economics, health, and psychology 2 −Marketing, advertising, regulation,…
3 Research Goal To discover the spatio-temporal pattern of tweets To infer the human mobility patterns behind the tweets To understand the lifestyle of college students
4 Study Area College town/city, Big Ten Universities West Lafayette, IN −Most densely populated city in IN −Home of Purdue University Ann Arbor, MI −University of Michigan Bloomington, IN −Indiana University, Bloomington Columbus, OH −Ohio State University
5 Data Geo-tagged tweets downloaded with Twitter Streaming API With longitude and latitude at time of posting Nov 18, 2013 to April 2, 2014 −West Lafayette : 59,238 −Ann Arbor: 220,117 −Bloomington :247,202 −Columbus: 1,936,238
6 Methods Pure Spatial −Point density analysis Pure Temporal Spatio-Temporal −Tweets in Land Use −Event/Anomaly detection −Individual twitter user patterns
15 1 - Commercial; 2- Industrial; 3- Mixed Use; 4- Office; 5- Public/Education; 6 – Recreation; 7- Residence; 8- Transportation; 9- Vacant/River Tweets and Land Use
16 Event Detection Spatially and temporally aggregated −Football game, concert, festival,… Use Purdue shooting on Jan 21, 2014 as an example −Lockdown from around 12-14pm Temporally −710 tweets in 12-14pm Jan 21, 231 unique users −7443 tweets in 12-14pm in the whole datasets, 1080 unique users Spatially −How to measure spatial anomaly?
17 Hypotheses Challenge: Inhomogeneous/clustered process even outside lockdown period −Were tweets more significantly clustered during lockdown than average? Intensity of tweets is correlated with distance to campus buildings Extent of clustering is positively correlated with chi- sqare value
18 Covariate: Purdue Buildings Purdue Building Shapefile converted to tesselation R libraries: maptools, sp, spatstat Functions: as.mask → im → tess
19 Randomization Test Algorithm (by Ken Kellner): 1. Select 710 random tweets from dates 1/16/14 - 1/26/14 and hours 12am - 14pm without replacement 2. Call quadratcount() and quadrat.test() on new random dataset 3. Save chi-square value 4. Repeat 1000 times to obtain distribution of chi-square values 5. Compare actual chi-square value obtained on 1/21/14 with distribution 6. Quasi-p value: proportion of values more extreme than obtained value Assumption: greater chi-square value = more inhomogenous/clustered Tested with simulation
20 Randomization Test Result Chi-square: 85162.85 Quasi-p value: 0.038 We were able to detect a change in the pattern of tweets during the lockdown, when presumably more people were stuck in Purdue buildings than average.
21 Event Detection We can see anomaly from Twitter data both temporally and spatially However, we are still looking for a complete and integrated algorithm, and apply it to other events To be cont’d
22 Frequent Twitter Users Top 10 Twitter users with the most tweets in Ann Arbor Plot the tweets of individual Twitter user Four typical patterns −Work-Home −Work-Road-Home −Work-Home-Short Visit −Multiple Clusters
25 Future Work On-going research Complete analysis in all 4 study areas, and compare the patterns Develop/Find an algorithm for event detection … Any suggestions are welcomed!
26 References 1. Goodchild, M. F., 2007. Citizens as sensors: The world of volunteered geography, GeoJournal, 69, 211- 221. 2. Ghosh, D., and R. Guha, 2013. What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System, Cartography and Geographic Information Science, 40(2), 90-102.