6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation from Incomplete Data Using Probabilistic Inference D. Kelly, J. Doyle and R. Farrell System Architecture Depicted below is the system architecture used in the analysis of mobile telephony call detail records (CDR) from the Irish mobile network operator, Meteor. Call Detail Records Call Detail Records (CDR) are event driven call logs from a mobile phone network operator. Available Information User based trajectories proxies User activities User social network Cell activity loads and communication flows Trajectory Information CDR trajectories are proxies for actual trajectories travelled by users. There is spatial and temporal uncertainty CDR trajectory accuracy error is effected by network cell density and user interaction with the network Partial trajectory information is available for meteor users outside of the Republic of Ireland CDR trajectories are available for roaming customers on the network Cell Activities Activity loads can be proxies for the presence of people They can highlight the locations of large social events and show the daily rhythms of activity. Social / Cell Network CDR records can reveal the social network behaviour of over 5 million users Cell connections can be a proxy for the connections between different spatial regions 1. Overview The availability of Call Detail Records (CDR) provides an unprecedented opportunity to rapidly interpret and respond to the movements of the entire population. In the past, population movement estimates have been generated using accurate GPS tracking of an insignificant subset of the population or alternative cellular network readings, such as; Erlang Data allows for real-time population estimates but is extremely biased by the tendency of people in a given location to make calls. Handover data provides individual velocity estimation at irregular intervals. In contrast, location area data provides extremely coarse location estimates at regular intervals. CDR data can be considered a compromise between the three data types. It has higher location resolution than location area data, and higher population visibility than Erlang and handover data. However, due to the temporal uncertainty of individual sampling intervals, intelligent processing must be employed to obtain representative population estimates. 2. Static Population Snapshots Using rule-based reasoning for a significant period of CDR data, it is possible to identify the home and work locations of individuals. Figure 1 illustrates the population estimates derived from the weekly activities of cellular network customers. Using these home and work location estimates, Figure 2 illustrates the distribution of median work travel distances. Units Static population snapshots require large amounts of data to generate a long-term consensus of an individual’s locations of interest. These techniques clearly do not scale to real-time population movement estimation where high temporal resolution would be necessary. Figure 3 illustrates the proportion of the population whose location is detectable, even after interpolation between their known locations takes place. 3. Dynamic Population Modelling Fig. 3: The hourly proportion of the population whose location is detectable after location interpolation. 80% best-case population visibility goes down to less than 20% at night. A first-order Markov chain allows the estimation of the next future state as a function of the current state, that is; where q t represents the state at time t and li and lj represent locations i and j respectively. Hence, the likelihood of a user being in location lj at time t can be estimated as; A transition probability matrix, A, can be formed from the individual transition probabilities, a ij, which can be empirically estimated from each user’s data, and t-step ahead prediction can recursively occur using the expression; where L t is a column vector of individual location probabilities. When a person’s location is detected by the cellular network the probability of that location is set to P(L i ) = 1 and all other probabilities are set to zero. t-step ahead prediction will then estimate the user’s location until they become visible again, as illustrated in Figure 4. (a) Individual Location Prediction Erlang Data: indicates the volume of calls at a given cell tower Handover Data: can indicate the movement trajectory of a person, only during calls Location Area: a coarse location estimate which indicates the set of cell towers which are ready to service an individual 3.32 pm 3.38 pm 3.46 pm 3.54 pm Fig. 4: Evolving location probability predictions. Green indicates a detected user location, blue indicates the intensity of location probability and red indicates the location of maximum probability. Accurately predicting traffic parameters, such as traffic speed and travel times, has typically required accurate handover data in the past. By using the aggregated motion models it is possible to analyse the generalised inter-cell transition probabilities to generate traffic volume (Figure 6) and speed estimates. (c) Traffic Parameter Estimation (b) Population Movement Prediction By aggregating the motion models ( A ) for all users and by initialising the location probability vector ( L ) to represent the proportion of the population in each cell, it is possible to estimate the combined movements of the entire population (Figure 5), despite the location uncertainty of a significant portion of the population. 6 am 11 am 5 pm Fig. 6: Traffic volumes estimated from total inter-cell transition probabilities. Thicker lines indicate higher volumes. Research is funded by a Strategic Research Cluster grant (07/SRC/I1168) from Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support. We would also like to gratefully acknowledge the support of Meteor for providing data used in this poster, in particular Mr. John Bathe and Mr. Adrian Whitwham. Fig. 2: Estimates of network customers’ median work travel distances Fig. 1: Population estimates of cellular network customers Since it is impossible to view the location of all people at all intervals during the day using CDR data, it is necessary to implement a unified framework for representing peoples discrete location at both known and unknown times. 4. Markov Chains

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

Similar presentations

Presentation on theme: "6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.

Similar presentations

Presentation on theme: "6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation."— Presentation transcript:

Similar presentations

About project

Feedback