Presentation is loading. Please wait.

Presentation is loading. Please wait.

Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.

Similar presentations


Presentation on theme: "Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part."— Presentation transcript:

1 Constructing Individual Level Population Data for Social Simulation Models Andy Turner http://www.geog.leeds.ac.uk/people/a.turner/ Presentation as part of the Social Simulation Tutorial at the International Symposium on Grid Computing in Taipei, Taiwan 2010-03-07

2 Outline Introduction Contemporary population data Developing Population Data Expertise Confidentiality and Disclosure Control Population Reconstruction is an Art 3 Population Data Integrations Integrating survey data Other Data for Social Simulation What next?

3 Introduction Individual Agents representing people can be generated for a region –Using entirely made up data –Based on existing aggregate data measured by a census or survey Agents attributes may be enriched using data from other sources

4 Agents can be collected into groups sharing common characteristics Agents can be geographically located in sub-regions and initialised with various attributes As a Social Simulation Model (SSM) is run Agents may become more complex as they interact and more detailed in terms of their history

5 Agents output from a SSM can be input into another SSM –This can be viewed as an enrichment process As simulation proceeds, then it can be hoped that a model becomes more realistic and representative of a population The “Garden of Eden” configuration we started with today is very unrealistic

6 After several generations it settles down into something more normal and which changes gradually Initially, the Age distribution of the population is quite odd and no females are pregnant, but after a number of generations things balance out –Without doing anything and allowing randomness to even things out, over time the distribution of birthdays should even up »However, this could take a very long time, a very large number of iterations if fertility was high and miscarriage was not modelled and gestation was of a fixed duration

7 Contemporary population data Most countries provide some form of aggregate statistics about population to the research community –In many cases this is publicly available –It tends to be derived from census surveys and/or registration data Most countries have a system for registering births, deaths and marriages. Some also have mandatory systems for recording peoples changes of residential address

8 Data in a very disaggregate or individual level is available for some countries –In most cases this is a sample of records These are sometimes annonymised in that identifying variables such as a persons name and sometimes also their residential location is removed In some cases this data is removed, but replaced with a unique identifier that is otherwise meaningless, but can be used to link back to other data –Pseudo annonymisation In addition in many countries there are large and small scale social surveys Also there are very detailed lifestyle data collected by business that is observing customers and also directly surveying the population –Population data is very useful and very valuable if it is good!

9 Additionally, data is collected by public service authorities –Health –Education –Utilities All these data can be integrated and used to create and enhance individual level population data The process of creating these data is sometimes referred to as population reconstruction

10 Probably all countries have a unique set of available population data –The people represented are different, so the data captured about them is often different –Common attributes when they are the same captured can be done so different and can be measured or stored in different ways E.g Age versus DateOfBirth

11 Developing Population Data Expertise Most countries have population data experts Becoming one of these and getting to grips with the data is a considerable effort –It is key to learn the details of what is available and what are the restrictions on its use This is getting easier as metadata improves

12 There are generally useful ways to combine and enhance population data whilst preserving confidentiality How best to do this all depends on what variables there are and how these are detailed

13 Confidentiality and Disclosure Control Confidentiality is a big issue in many countries It is such a big issues in some countries that people have voted to get rid of their data! –There is no population census in Germany or in the Netherlands

14 Disclosure control –E.g. Annonymisation by removal of names and addresses –Helps to keep some people happy that the data exists –Security is a big issue with population data We need to be trusted with the data if we are to put it to good use People worry rightly as the data can also be put to bad use

15 Population Reconstruction is an Art Because the types of available population data can be so different –There is little point in detailing specific data integrations now However it can be argued that most population data from whatever type of survey can be integrated –Whether it is useful to do so depends on many things

16 3 Population Data Integrations 1.Using a representative sample survey and integrating this with aggregate data from a comprehensive census to produce individual level census data estimates 2.Linking survey data with additional variables to individual level census data 3.Linking two different sets of survey data using common variables to form an integrated survey with a set of desired variables

17 Integrating survey data Sometimes the term data fusion is used Most survey data is bias, but some attempts to be or can be reduced to be generally representative in the proportions of each time of person reflect that of the overall population

18 Representative survey data can be linked using probabilities and random assignements of characteristics Known to be biased surveys require data fusion to be used and the hope that most types of people are represented in the sample survey as exist in the total population

19 Usually it is partial survey data that has interesting extra variable that might be of interest for a simulation or for comparing with another variable –It might be that after fusing the data it is the first time that two variables can be tested for correlation

20 Other Data for Social Simulation As SSM are part of this we should consider other data used to drive the models Especially the probabilities for the major processes being modelled –Mortality Death –Fertility Birth

21 What next? 12:30 Lunch 14:00 Infrastructures for Social Simulation (Rob Procter) 14:30 Introduction to Grids and Cloud Computing 15:00 Coffee break


Download ppt "Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part."

Similar presentations


Ads by Google