Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC.

Similar presentations


Presentation on theme: "Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC."— Presentation transcript:

1 Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC Resource Centre based at the Institute of Education

2 2 2 Contents 1Introduction 2Survey Data ‘production line’ 3Data Management Compared 4National Longitudinal Surveys 5PSID and HRS (USA) 6MCS, NCDS and BCS70 (UK) 7LISS Panel (Netherlands) 8Management strategies compared 9Storage, maintenance and output 10Meta Data Standards 11New Requirements

3 3 3 Introduction In November 2008 CLS (MCS,NCDS, BCS70) and ULSC (BHPS, Understanding Society) were commissioned as part of Objective 5 of the Survey Resources Network by the ESRC to:  Examine potential efficiencies in data management processes, particularly in relation to data management software;  Examine the use of cutting-edge data collection methods for longitudinal surveys carried out at CLS/ULSC Completed a wide ranging review of the Survey Data Process and submitted it to the ESRC in November 2009. www.cls.ioe.ac.uk

4 4 4 Survey Data ‘production line’ www.cls.ioe.ac.uk

5 5 5 Data Management Compared Various strategies to cope with the complex data flows of survey collection, management and dissemination: Final report will be available from http://surveynet.ac.uk/sms/introduction.asp Highly Integrated : National Longitudinal Surveys (USA) Partnership : PSID and HRS (USA) Contracted : MCS, NCDS and BCS70, BHPS,USoc (UK) Loosely Integrated : LISS Panel (Netherlands) www.cls.ioe.ac.uk

6 6 6 National Longitudinal Surveys (USA) Over more than two decades the NLS has developed in-house software to capture the survey. More recently they have integrated this into a turnkey solution where the storage of the survey is itself a mirror of the data collection instrument. Based on a highly normalised Oracle database, a snapshot of the data is auto-processed and available to researchers on a “create your own dataset basis” and then turned into standard flat datasets for use by researchers. Ref: http://www.chrr.ohio-state.edu/http://www.chrr.ohio-state.edu/ www.cls.ioe.ac.uk

7 7 7 PSID and HRS (USA) Both the Panel Study of Income Dynamics (PSID) and the Health and Retirement Survey (HRS) utilise the in-house resources of the Survey Research Centre which provides survey data collection resources primarily to studies based at the University of Michigan. Survey instrument design is closely linked both to the PI and data management teams using Blaise for data collection. Data is prepared internally using SAS and processed to download as packaged datasets from PSID and also from IPCSR. Ref: http://psidonline.isr.umich.edu/ and http://hrsonline.isr.umich.edu/http://psidonline.isr.umich.edu/http://hrsonline.isr.umich.edu/ www.cls.ioe.ac.uk

8 88 MCS, NCDS and BCS70 (UK) CLS is responsible for specification of the instruments and data output which is implemented by a third party survey organisation. Data is further processed within CLS using SIR and provided to researchers as packaged datasets for download from the ESDS Data Archive. Meta-data is harvested from the CAI instrumentation and held in an SQL database for generation of HTML web pages directly from DDI 2.0 XML Ref: http://www.cls.ioe.ac.uk and http://www.cls.ioe.ac.uk/datadictionaryhttp://www.cls.ioe.ac.uk www.cls.ioe.ac.uk

9 9 9 LISS Panel (Netherlands) The LISS Panel is primarily a web based survey, which uses a layer over Blaise with a dedicated survey instrument programming section closely linked to the survey design team. Data is produced from Blaise and managed in SPSS and provided as prepared datasets for use by researchers for download from LISS. A separate SQL metadata database, based on DDI 3.0 is used to provide navigation and generate the codebook etc. Ref: http://www.lissdata.nl/lissdata/Homechttp://www.lissdata.nl/lissdata/Home www.cls.ioe.ac.uk

10 10 Management strategies compared All studies face the same challenges 1.Complex data 2.Data description handling 3.Management of meta-data 4.Myriad audiences 5.Longitudinal consistency 6.Resource constraints 7.Re-purposing of data www.cls.ioe.ac.uk

11 11 All in one basket approach www.cls.ioe.ac.uk NLSNHANES

12 12 Data and Meta-data separated www.cls.ioe.ac.uk LISS / PSID / HRSMCS / NCDS / BCS / BHPS / USoc

13 13 14 Storage, maintenance, output www.cls.ioe.ac.uk Cleaning your data Cohort data continually evolves 2-3% of people mis-report sex Interviewers mis-key data Data entry clerks mis-key data Respondents mis-understand questions Outputting and deriving data Synchronizing changes, derivations and internal consistency, e.g. geographical identifiers and outputting in the best format for research is a function best done by DB staff

14 14 15 Meta Data Standards The Data Documentation Initiative has emerged as the front runner as the basis for an international standard 1.Existing foothold is limited 2.Lacks sufficient support for longitudinal studies 3.Provides at least a minimum of data which would enable international cross-cohort data discovery Can we establish a ‘Dublin Core’ for longitudinal / birth cohort surveys? www.cls.ioe.ac.uk

15 15 13 New Requirements Video / audio Genetics Web capture e.g. social networks Paper Archives Record Linkage Biological measures Data security (ISO27001) Disclosure control www.cls.ioe.ac.uk

16 16 Any questions? Institute of Education University of London 20 Bedford Way London WC1H 0AL Tel +44 (0)20 7612 6000 Fax +44 (0)20 7612 6126 Email info@ioe.ac.uk Web www.ioe.ac.uk


Download ppt "Data Collection, Harmonisation and Storage (An international perspective) Jon Johnson (CLS, Senior Database Manager) Sub-brand to go here CLS is an ESRC."

Similar presentations


Ads by Google