Presentation on theme: "Human Mobility Modeling at Metropolitan Scales Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky,"— Presentation transcript:
1 Human Mobility Modeling at Metropolitan Scales Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky, Walter Willinger Princeton University, Princeton AT&T Labs, Florham Park, NJ, USAMobisys 2012Junction
2 Introduction Human mobility model Mobile sensing, opportunistic networking, urban planning, ecology, and epidemiologyAim to produce accurate models of how large populations move within different metropolitan areasGenerate sequences of locations and associated times that capture how individuals move between important places in their lives, such as home and work.Aggregate the movements of many such individuals to reproduce human densities over time at the geographic scale of metropolitan areas.Take into account how different metropolitan areas exhibit distinct mobility patterns due to differences in geographic distributions of homes and jobs, transportation infrastructures and other factors
3 Shortages of previous works Random motion -> unrealistic motionLack of memory about recurring movement patternsLack of spatiotemporal realism about population densitiesWithout diverse population and geographic concernsSmall geographic area (e.g. campus)Too universal -> fail to adopt to different geographic areas
4 Sources Call Detail Records (CDRs) Creating human mobility models from such dataMaintained by cellular network operatorsInformation -> Sporadic samples of the approximate locations of the phone’s ownerThe time a voice call was placed or a text message was receivedThe identity of the call tower with which the phone was associatedDifficulties:Whether associated locations corresponding to home, work, or other important places for particular cellphone users.Both the spatial and temporal granularity of CDR data is quite coarseSpatially, the granularity of cell-tower spacingTemporally, only generated when phones are actively involved
5 Overall view WHERE modeling approach (Work and Home Extracted Regions) Human movement (important locations, commute distances .. )=> probability distributions=> generate synthetic CDRs for an arbitrary number of synthetic people
6 Parameters for mobility modeling Human mobility is tightly coupled to the geography of the city people live.=> should take into account both the area geography and individual user mobility patterns.Spatial information: important locationsSpatiotemporal information: hourly population densitiesTemporal Information: calling patterns
7 Spatial: Important Locations A full 60% of mobility can be accounted for just the top two cellphone towers with which a user is associated.Important locations: home and workProbability distributionhome locations: HomeCommuteDistance : dWork locations: Work
8 Spatiotemporal: Hourly population densities Heavily residential area is likely to be more populated at nightCommercial district is likely to be more populated during the dayHourly population density: simply reflects the probability of people being at a particular location at a particular time (Hourly)Assumption: the spatial densities of telephone calls is approximately equivalent to the spatial density of people7 ~ 8 p.m. on weekdaysOver a 3-month period
9 Temporal: calling patterns A user’s daily call volume characteristics from distribution:PerUserCallsPerDay ~ N(mean, sd)Temporal patterns of when those calls are made:2 classes (clustered by k-mean method)How many calls each user makes during each hour of the day24-dimensional vectorHourly call probability distribution: CallTime
10 Algorithm for model generation WHERE2 => Two-place model: Work and HomeMakes use of the fact that most people spend the majority of their time either at home or at work.
11 WHERE2: Work and HomeUsers’ movement: occurs based on synthetic CDRs representing calls made at different locations at different times
12 Extension to Additional Places WHERE3: increasing the number of important locationsTradeoff: model complexity <-> fidelity of the synthetic trace
13 “distance”, the linear mapping used in EMD. EvaluationEarth Mover’s Distance (EMD)A “good” synthetic trace has the synthetic user population distributed in a very similar way in space as the real trace, at any point during the day.A measure for comparing two spatial probability distributionsAttempt to find the minimum amount of energy required to transform one probability distribution into another.This energy is given by the “amount” of probability to be moved and the “distance” to move it.“distance”, the linear mapping used in EMD.
14 Evaluation Comparison models Random Waypoint (RWP) Each user selects a random destination from all possible destinations in the area to be simulated.Once a destination is selected, the user moves at a random velocity toward the destination.When the selected destination is reached, the user waits for a random amount of time and then selects a new destination and new velocity to begin the procedure.Weighted Random Waypoint (WRWP)The destination is chosen from a location probability distributionHere, use the distribution “Hourly”
15 Evaluation Sources for input probability distributions Real call detail records (CDRs)ZIP codes within a 50-mile radius of the LA and NY centersBilling addresses lie within the metropolitan regions of interestPhone id, starting time, duration, cell towers locationsData Validation (CDR can accurately represent the mobility patterns)The number of sampled phones in each ZIP code is proportional to the population of that ZIP codeThe maximum pairwise distance between any 2 cell towers contacted by a phone in one day is a close approximation for how far the phone’s owner traveled that day.Applying certain clustering and regression techniques produces accurate estimates of important locations in people’s lives, in particular home and work.
16 Evaluation Sources for input probability distributions Census Data Need to buy CDRs!Publicly available data regarding home and work locations, as well as commute distances, for large populationsProvide little or no information about the hourly probabilities of a given locationMake an assumption about the hourly distributionsCombination of public data and CDRshome, work, and commute distances come from the censusOther distributions are drawn from CDR data
17 Evaluation Artificial Test Cases To reason about the expected behavior of the model and its strengths and weaknessesTwo locations:The model must be able to place synthetic users at home and work locations and move them according to time of daysince we emulate CDRs with discrete call locations, the synthetic users must only exist at these locationsThe entire world is populated by users that move predictably between two locations at highly regimented intervals.From 7am to 7pm on weekdays, all of the probability is concentrated in a single “work” location. At all other times, the probability is clustered in a second “home” location.
18 Validation Two locations test case WHERE2 is able to correctly model the call distribution information both during “home” and “work” phases.WRWP doesn’t have sufficient temporal input to distinguish how 9am mobility probabilities should differ from 6am ones.Idea result, in which probability is a spike at the home location at 6am, and a spike at work location at 9am.RWP clearly differs greatly, because it is given so little input information regarding spatial or temporal patterns to model.Two locations test case
19 Evaluation Artificial Test Cases To reason about the expected behavior of the model and its strengths and weaknessesIf multiple possible works exist for a home, the model must select a realistic one.has a single user travel to each of the four possible locations, resulting in a signifﬁcantly worse EMD.WHERE2 detects that users make calls only from one of two locations and thus correctly positions users.
20 Validation: Large Scale Real Data Modeling based on real CDRsWRWP improves over original RWP by 4 timesWHERE out performs WRWP by additional 20%WHERE3 improves further (NY by additional 3 miles accuracy)
21 Validation: Large Scale Real Data Modeling based on real CDRs
22 Validation: Large Scale Real Data Modeling based on Census DataUsing all-public data from the US CensusWith average error of 8 miles in NYUsing hybrid of census data with some CDR informationAn average error of 6.8 milesUsing all-CDR information in WHERE3Reduces average error to some 3 miles
23 *. WHERE2: median is within 0.8 miles of the true value. Example UsesDaily range: maximum distance a person travels in one dayserves as an important metric for verifying correctness of the generated modelsThe EMD metric measures the aggregate behavior of the users, but daily range displays the results at a per-user granularity.Comparing NY and LA:LA with only 1 mile errorapplicability across cities with very different mobility patterns and geographic characteristics.(2, 25, 50, 75, 98) percentile*. WHERE2: median is within 0.8 miles of the true value.*. WHERE3 0.7 miles error=> Adding a third point to the model provides very little benefit for this use.
24 Example Uses Message Propagation: relevant in opportunistic networking Social contacts, epidemiology and data carryingSimulator for epidemic routingAs user meet they exchange all messages (0.5 radius)Delivery percentage and message delay rate
25 Example UsesHypothetical Cities: what-if scenario regarding mobility in NYCreate parameterized model of cities and user behavior patternsCity planner to experiment with the effects of modifications that are being considered10% of the people whose original work location were in the borough of Manhattan were instead given work locations that matched their home location.
26 Conclusion Human mobility modeling Refinements Capture the motion of individuals among important places in their livesAggregating that motion to reproduce human densities over time at the scale of a metropolitan area.Accounting for differences between metropolitan areas.RefinementsTo produce not only sequences of locations with associated times, but also routes taken between those locations.CDRs and census tables are too coarse to provide route informationSequences of Cell towers passing by