Geodemographic modelling collaboration Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research 2009-04-21.

Slides:



Advertisements
Similar presentations
NGS computation services: API's,
Advertisements

Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group ( University.
Data Monitoring Confidentiality and the Grid Mark Elliot Confidentiality And Privacy Group ( University of Manchester.
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
SICSA student induction day, 2009Slide 1 Multithreading RePast Models Alex Voss 1, Jing-Ya You 2, Eric Yen 2, Simon Lin 2, Ji-Ping Lin 3, Andy Turner 4.
E-Social Science: scaling up social scientific investigations Alex Voss, Andy Turner (ESRC National Centre for e-Social Science) Gabor Terstyanszky, Gabor.
The micro-geography of UK demographic change Paul Norman School of Geography, University of Leeds Understanding Population Trends and Processes.
WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA.
Sample of Anonymised Records: User Meeting Propensity to migrate by ethnic group: 1991 & 2001 Paul Norman 1, John Stillwell 2 & Serena Hussain 2 School.
The e-Social Science Research Agenda Peter Halfpenny and Rob Procter School of Social Sciences - University of Manchester UK e-Science All Hands Meeting.
Visual Solution to High Performance Computing Computer and Automation Research Institute Laboratory of Parallel and Distributed Systems
GENESIS Web 2.0 Agent City Simulation: Establishing a user community and enabling collaborators to manipulate simulations and develop models Andy Turner.
Modelling and Simulation for e-Social Science Mark Birkin School of Geography University of Leeds.
MoSeS meets NEC 10 th March 2008 MoSeSMoSeS Andy Turner
Alternative Futures – ASAP Research Cluster Seminar 16 th November 2005 MoSeS Starts for the Promised Land Andy Turner Outline –Introduction –Population.
4 th International Conference on e-Social Science: Workshop 5: Agent-Based Modelling for the Spatial-Social Sciences Reconstruction of the.
School of something FACULTY OF OTHER School of Geography FACULTY OF ENVIRONMENT Modelling Individual Consumer Behaviour
International Symposium on Grid Computing 2010 Applications on Humanities & Social Sciences I Taipei, Taiwan ( ) GENESIS Social Simulation Modelling.
Oxford eResearch Conference 2008 Paper Session 4A: NCeSS Oxford, UK, ( ) Experience of e-Social Science: A Case of Andy Turner and MoSeS Andy.
The NGS Roadshow Bath Geodemographic Modelling on the NGS Andy Turner
Individual and Household Level Estimates Based on 2001 UK Human Population Census Data Andy Turner CSAP Seminar on Microsimulation: Problems and Solutions.
CCG 1 MoSeS Introduction and Progress Report Andy Turner
School of Geography CENTRE FOR SPATIAL ANALYSIS AND POLICY e-Infrastructure for Large-Scale Social Simulation Mark Birkin Andy Turner.
E-Social Science: scaling up social scientific investigations Alex Voss, Andy Turner, Rob Procter National Centre for e-Social Science Gabor Terstyanszky,
Modelling and Simulation for e-Social Science (MoSeS) Mark Birkin, Martin Clarke, Phil Rees, Andy Turner, Belinda Wu (School of Geography) Haibo Chen (Institute.
Shirley Crompton Source: Rob Allan. Institutional Repository Subject Repository Data Producer Repository share resources solve bigger problems integrate.
Synthetic data, microsimulation and socio-demographic forecasting Mark Birkin School of Geography Professor of Spatial Analysis and Policy.
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
Andy Turner On MoSeS 28 th March 2007 Andy Turner On MoSeS Andy Turner
MOSES: Modelling and Simulation for e-Social Science Mark Birkin, Martin Clarke, Phil Rees School of Geography, University of Leeds Haibo Chen, Institute.
Census.ac.uk Census Area Statistics and Casweb David Rawnsley Census Dissemination Unit (CDU) Mimas University of Manchester.
Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.
1 portal.p-grade.hu További lehetőségek a P-GRADE Portállal Gergely Sipos MTA SZTAKI Hungarian Academy of Sciences.
Supercomputing, Visualization & eScience1 e-Social Science Grid technologies for Social Science: the Seamless Access to Multiple Datasets (SAMD) project.
Estimating Individual Behaviour from Massive Social Data for An Urban Agent-Based Model Nick Malleson & Mark Birkin School of Geography, University ESSA.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Grid Enabled Data Fusion for Calculating Poverty Measures Social Science Applications UK e-Science ALL HANDS MEETING 2006.
School of something FACULTY OF OTHER School of Geography FACULTY OF EARTH AND ENVIRONMENT MOSES: A Synthetic Spatial Model of UK Cities and Regions Mark.
School of Geography FACULTY OF ENVIRONMENT The Elements of a Computational Infrastructure for Social Simulation Mark Birkin 1, Rob Allan 2, Sean Beckhofer.
Longitudinal Data Recent Experience and Future Direction August 2012.
EpiFast: A Fast Algorithm for Large Scale Realistic Epidemic Simulations on Distributed Memory Systems Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V.S.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Introduction to Spatial Microsimulation Dr Kirk Harland.
Some aspects concerning analytical validity and disclosure risk of CART generated synthetic data Hans-Peter Hafner and Rainer Lenz Research Data Centre.
AgINFRA science gateway for workflows and integrated services 07/02/2012 Robert Lovas MTA SZTAKI.
P-GRADE and GEMLCA.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial International Symposium on Grid Computing Taipei, Taiwan, 7 th March 2010.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
13-Jul-07 Item 1 – Introduction. 13-Jul-07WG Core variables in social surveys Name of the presentation 16 Core Variables… 1.Geographic data I (linked.
EGEE-II INFSO-RI Enabling Grids for E-sciencE P-GRADE overview and introduction: workflows & parameter sweeps (Advanced features)
1 Support for Parameter Study applications in the P-GRADE Portal Cevat Şener Dept. Of Computer Engineering, METU.
The National Grid Service Mike Mineter.
OGC/OGF usage in UK e-Social Science OGF 21, Seattle, USA Paul Townend School of Computing, University of Leeds.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
1 Porting applications to the NGS, using the P-GRADE portal and GEMLCA Peter Kacsuk MTA SZTAKI Hungarian Academy of Sciences Centre for.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
Implementing Dynamic Data Assimilation in the Social Sciences Andy Evans Centre for Spatial Analysis and Policy With: Jon Ward, Mathematics; Nick Malleson,
SCI-BUS project Pre-kick-off meeting University of Westminster Centre for Parallel Computing Tamas Kiss, Stephen Winter, Gabor.
P-GRADE and GEMLCA.
Structural dynamic microsimulation modelling at the National Institute
Developing a daily time step individual level demographic simulation model Andy Turner Presentation slides prepared for a CSAP research cluster meeting,
Supporting precise data analysis without releasing patient records: the Simulacrum in action Cong Chen, Paul Clarke, Lora Frayling, Sally Vernon, Brian.
Presentation transcript:

Geodemographic modelling collaboration Alex Voss, Andy Turner Presentation to Academia Sinica Centre for Survey Research

Overview Background MoSeS –Population Reconstruction –Dynamic Simulation Summary of on-going projects/effort Future Work Next Steps Acknowledgements

Background Social science does not traditionally use advanced ICTs but emergence of new analytical methods is driven by: –Increased availability of data about social phenomena –Issues with data management and integration –Challenges to analyse social phenomena at scale –Challenges to inform practical policy and decision making (e.g., evidence-based policy making) National Centre for e-Social Science (NCeSS) in the UK is investigating ways to respond to these challenges. EUAsiaGrid is supporting e-Social Science amongst other application domains

MoSeS MoSeS develops modeling and simulation approaches for social science –First phase research node of NCeSS, now continued through second round node GENeSIS Contemporary demographic modeling of the UK based on UK census data and other datasets Using agent-based simulation to project population forward in time by 25 years Simulate the impact of distinct demographic processes such as mortality, fertility, health status, household formation, migration E.g., to inform policy making – what impact do policy decisions have

Population Reconstruction Generation of an individual level population data for the UK –Based on 2001 census data –Works with ‘public release’ versions of census that are restricted, Census Aggregate Statistics at Output Area Level 1% of population (anonymisation) –Reconstructed data has same attributes as real population and same number of individuals but is still anonymised –Uses a genetic algorithm to select a well fitting set of sample of anonymised records to assign to an output area –Need for attributes in the SAR to be matched with those in the CAS This is often complicated because of different categories –Aggregation to a lowest common categorisation

Population Reconstruction (II) Output Generation Raw Data Population Reconstruction Data Output Simulation Synthetic Population Disclosure Control Validation* 

Population Reconstruction (III) If  was 0, we would have a dataset that happens to match the raw census We are looking for a synthetic population with a ‘reasonable’  How can we check if  is ‘reasonable’?

Input Data Example: Individual-level Samples of Anonymised Records (ISARs) Downloadable for UK academics from Click-through End-User License Disclosure control through: –Selection –Aggregation –Permutation No additional output checking required

Output Variables Demographics Employment Health status Household composition Two types of constraints: –Control constraints (have to be met) –Optimisation constraints (used in fitness function)

Control Constraint

Optimisation constraint

Running Pop. Reconstruction Can use geographic segmentation to compute each output area in parallel Good example of high-throughput computing using master-slave architecture Each output area takes about 5 minutes to compute (on average) UK contains about 223,060 output areas, so a complete run would take about 775 days to complete Different control constraints lead to different runtime behaviour for each output area This can be reduced by applying, e.g., 128 processors, reducing the runtime to 6 days More than likely to have errors in such a large compute job, need robust error-recovery Realistically, it takes 2 weeks to compute a complete output

Application Porting Step 0 Code runs through PBS UK Census Data Local Compute Resource Step 1 Code on NGS UK Census Data Any NGS resource Step 2 Code on NGS UK Census Data Run through p-Grade portal Step 2 Code on EGEE UK Census Data GridPP Resources (Step 2.5) Code on EGEE UK Census Data ASGC Resource Step 3 Code on EGEE Using TW Census Data ASGC Resource Currently porting population reconstruction code to EGEE, investigating TW data assets and exploring other links, e.g., with healthcare research Setting up e-Social Science VO

Demo…

p-Grade Portal

Experiences Integrating existing code into grid environment required some changes to source code –management of input arguments –code scalability –log management –error handling Finding the right input size and parameters for testing to keep execution times low Making sense of execution failures –lack of ways to debug code in distributed environments

Experiences II Step-wise process works well, –ensures we encounter problems piece by piece –allows us to comply with data protection / licensing Population reconstruction is resource intensive –may run up against limits on wall clock time Importance of ‘at elbow’ support –but hindered by data protection/licensing issues Licensing means we need to limit execution to UK resources No e-Social Science VO for EGEE porting (yet) –Needed to get support from other VOs in the meantime

Dynamic Modelling Daily activity modelling –Commuting –Retail modelling –Transportation Population Forecasting –Annual time step –Birth –Death –Migration

Future Work Next steps until code runs in Taiwan with Taiwanese data –Proof of concept execution on Quanta cluster at ASGC –Definition of data outputs from Modularising computation so it can exploit multiple NGS nodes or EGEE CEs Improving data and code staging Moving from population reconstruction to supporting the simulation process Integration into ‘science gateway’ for the social sciences and developing a repository for models

Acknowledgements National Centre for e-Social Science –MoSeS Node: Mark Birkin (PI) –GENeSIS Node: Mike Batty (PI) –NCeSS Hub: Peter Halfpenny and Rob Procter EUAsiaGrid Consortium –Marco Paganoni (Project Director) CPC at Westminster University –Gabor Szmetanko –Gabor Terstyanszky –Tamas Kiss GridPP –Jens Jensen and Jeremy Coles National Grid Service –Jason Lander and Shiv Kaushal (Leeds), Steven Young (Oxford), Mike Jones (Manchester)