Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dealing with variables: Resources and topics in enhancing secondary survey data Paul Lambert University of Stirling DAMES research Node, www.dames.org.ukwww.dames.org.uk.

Similar presentations


Presentation on theme: "Dealing with variables: Resources and topics in enhancing secondary survey data Paul Lambert University of Stirling DAMES research Node, www.dames.org.ukwww.dames.org.uk."— Presentation transcript:

1 Dealing with variables: Resources and topics in enhancing secondary survey data Paul Lambert University of Stirling DAMES research Node, www.dames.org.ukwww.dames.org.uk Part of session 17 ‘Resources (i): Resources for data management’ 6/JUL/2010 4th ESRC Research Methods Festival St Catherine’s College, Oxford. 5-8 July 2010

2 Dealing with variables: Resources and topics in enhancing secondary survey data 1)‘Rigorous and vigorous’ approaches to dealing with variables 2)Three specialist topics: The GESDE services for data on occupations, ethnicity and educational qualifications

3 …Survey research and variable analysis…

4 4 ‘Data management’ applied to variables refers to…  ‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’ […DAMES Node..]  Usually performed by social scientists themselves Pre-analysis tasks (though often revised/updated) Inputs also from data providers  Usually a substantial component of the work process But may not be explicitly rewarded (sometimes even penalised..)  a little different from archiving / controlling data itself

5 5 Some components in secondary survey research…  Manipulating data  Recoding categories / ‘operationalising’ variables  Linking data  Linking related data (e.g. longitudinal studies)  Combining / enhancing data (e.g. linking micro- and macro-data)  Secure access to data  Linking data with different levels of access permission  Full or restricted access to detailed micro-data  Harmonisation standards  Approaches to linking ‘concepts’ and ‘measures’ (‘indicators’)  Recommendations on particular ‘variable constructions’  Cleaning data  ‘missing values’; implausible responses; extreme values

6 6 Example – recoding data [use a ‘recode’ or file matching routine]

7 7..plus the centrality of keeping clear records of DM activities Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007  In survey research, this means using clearly annotated syntax files (e.g. SPSS/Stata) Syntax Examples: www.dames.org.uk/workshops/ www.longitudinal.stir.ac.uk

8 Some provocative examples for the UK…  Social mobility is increasing, not decreasing!! −Popularity of controversial findings associated with Blanden et al (2004) −Contradicted by wider ranging datasets and/or better measures of stratification position −DM: researchers ought to be able to more easily access wider data and better variables  Degrees, MSc’s and PhD’s are getting easier −{or at least, more people are getting such qualifications} −Correlates with measures of education are changing over time −DM: facility in identifying qualification categories & standardising their relative value within age/cohort/gender distributions isn’t, but should, and could, be widespread  ‘Black-Caribbeans’ are not disappearing −As the 1948-70 immigrant cohort ages, the ‘Black-Caribbean’ group is decreasingly prominent due to return migration and social integration of immigrant descendants −Data collectors under-pressure to measure large groups only −DM: It ought to be possible to harmonise measures of ethnicity over time, and to build richer data resources with more cases (e.g. by merging survey data)  People interpreted the RAE wrongly! −Most responses to the RAE 2008 involved comparing GPA scores between subject areas within and/or across institutions; but standardising relative to subject area distribution, or scaling by subject area, often gives very different results. −DM: see Lambert and Gayle (2008) for a demo of alternative uses of RAE data

9 What might a rigorous and vigorous variable analysis look like?..open to debate but I’d nominate:  Replicability  Features a pro-active review of variables Review a full set of alternative measures Review alternative functional forms Attention to distribution/standardisation Attention to harmonisation

10 How should I make my work replicable?  The concept of a ‘workflow’ is a useful device for documenting a survey research project  Workflows involve organising materials as a series of interrelated but distinctive components  In survey research, software syntax files make excellent templates for documenting our work in component elements [Long, 2009; Treiman, 2009; Altman & Franklin, 2010; Kulas, 2008]  Computer science researchers have developed workflow depositories [e.g. MyExperiment] and workflow capture tools [e.g. Taverna]

11 Ad hoc organisation of a workflow as a ‘master file’ in Stata Forthcoming workshop: ‘Documentation and workflows for social survey research’, University of Stirling, 1-2 September 2010, see www.dames.org.uk

12 A workflow summary in Excel (following Long, 2009)

13 How should I review variables/functional forms/distributions/harmonisations?  We tend to rely on personal expertise in particular subject domains  Expertise of the depositor of the data  Expertise of the analyst Some textbooks and other capacity building events cover these topics generically [e.g. Treiman 2009], but by and large they get unduly neglected from methodological training …Something called ‘e-Science’ can help with both variable reviews and replication…

14 The ‘e-Social Science’ endeavour see http://www.merc.ac.uk/ for up-to-date linkshttp://www.merc.ac.uk/  A number of UK projects seeking to improve social science research by capitalising on emerging computer science techniques  Handling distributed data; collaborative technologies; large and complex data; secure data  The ‘Grid’ embodies these technologies, but more generic terms like ‘e-Social Science’ & ‘Digital Social Research’ are increasingly preferred  GESDE: ‘Grid Enabled Specialist Data Environments’ 14

15 e-Social Science, BSA200915 Example: Understanding New Forms of Digital Records (DReSS) http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/DReSS.html http://web.mac.com/andy.crabtree/NCeSS_Digital_Records_Node/DReSS.html  transcribed talk  audio  video  digital records  system logs  location transcript code tree video system log

16 16 This session part-organised by the ‘Data Management though e-Social Science’ node  DAMES – www.dames.org.ukwww.dames.org.uk  ESRC Node funded 2008-2011  Aim: Useful social science provisions by exploiting tools for data management developed in computer science. Core components are:  Data curation tool  Data fusion tool  Portals for access to data and data resources

17 Data curation tool collects metadata and allows data resources of different formats to be organised in an accessible depository

18 Data fusion tool supports merging of data files through shared variables (e.g. for recodes, aggregations, pooling data, linking related data, probabilistic linkages) External user (micro-social data) Occ info (index file) (aggregate) User’s output (micro-social data) idougsex.ougCS-MCS-FEGPidougCS 11101. 6058I 111060. 23201. 6971II 232069. 33202.8743951VIIa 332071. 48741. 4 39. 58742. 5 51.

19 GEMDE – Example of a ‘portal’ for distributing and accessing supplementry data related to ethnicity

20 2) Special Topics: The GESDE services for sociological classifications  ‘Key variables’ in social science research are not just for sociology, but are much debated there  Complex categorical measures and ‘variable operationalisation’ recommendations/debates  Individual level measures of social positioning…  ‘GESDE’ = 3 related online services which are “Grid Enabled Specialist Data Environments”  GEODE: the ‘o’ is for data on Occupations  GEEDE: the ‘e’ is for data on Educational qualifications  GEMDE: the ‘m’ is for data on ethnic Minorities

21 Our contribution in GESDE..  Many existing resources on these topics [See app.]  Academic reviews and projects  [e.g. Rose & Harrison 2010; Ganzeboom, 2008; Schneider, 2008; Guveli, 2006]  Service providers  [e.g. ESDS variable guides; CESSDA-PPP]  National Statistics Institutes’ guidelines  [e.g. www.ons.gov.uk/about-statistics/harmonisation/]www.ons.gov.uk/about-statistics/harmonisation/  It’d be good if more people were engaging with and exploiting these resources to enhance their own data..!

22 22 At the centre of this are problems of standardizing categorical data  ‘Measurement equivalence’ (e.g. van Deth, 2003) is often not feasible for complex categorical measures  For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence (because of non-linear relations between categories and shifting underlying distributions) (even if measurement equivalence seems possible)  Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context  For categorical data, this can be achieved/approximated by scaling categories in one or more dimension of difference

23 23 ‘Effect proportional scaling’ using parents’ occupational advantage

24 What was that then?  We can represent categories through positions on a scale  In turn, we can use position in the dimension as a category score which then plugs into a further analysis (e.g. regression main and interaction effects)..E.g. some options for data on ethnicity..  Stereotyped Ordered Logistic Regression (SOR) models, summarize dimensions of difference according to regression predictor values [e.g. Lambert and Penn, 2001]  Geometric data analysis for distances between people, or things [cf. Prandy, 1979; Bennett et al., 2009]  Assign category scores by hand (a priori or by selected average) 24

25 25

26 2(a) Data on occupations  Occupational unit groups = standardised lists of occupational titles  E.g. via CASCOT, www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ 26

27 ..data on occupations..  find ways of attaching summary information about occupations to occupational unit groups 27

28 Comparability problems => value of documenting methods & comparing alternatives 28

29 GEODE: Our contribution  GEODE acts as a library style service for access to ‘occupational information resources’  We encourage people to supply data they’ve produced, and we upload data ourselves  Researchers are encouraged to use the portal to find and exploit suitable data  Services: search, browse, deposit data, link data, user ratings 29

30 GEODE (v1) – Occupational data

31 Survey Network 4 June 2009 31 Using occupational data: Example as a measure of marked social disadvantage Lambert & Gayle (2009)

32 32 [Example: Occupational not geographical inequality]

33 2(b) Data on educational qualifications  Similar issues arise with the use of educational data  Specialist resources exist which can enhance measures of educational data  Many users aren’t aware of alternative coding schemes or harmonised approaches  GEEDE acts as a service for bringing together and disseminating relevant data resources on educational measures

34 34 Example – recoding data

35 35 Family and Working Lives Survey (54 vars per educ record)

36 2(c) Data on ethnicity  We can conceive of similar information resources and data analysis requirements for measures of ethnicity  There are generally fewer published resources / agreed standards in this domain  GEMDE publishes resources but puts more emphasis on understanding complex ethnicity data 36

37 …working with ethnicity data in surveys is hard…! - It’s sparse - It’s collinear (e.g. to age, location) - It’s dynamic (cf. comparative research) 37

38 38 EFFNATIS sample (1999): Subjective ethnic identity [Heckman et al., 2001]

39 39 A ‘data management’ contribution  Preserve information on what was done with categorical data  Communicate information on what should/could be done

40 GEMDE seeks to promote replicability / transparency…  Document your own recodes  Access somebody else’s recodes  Identify commonly used recodes (& use them..!) 40

41 ..and making complex analysis of ethnicity data easier..  Organising complex categorical data  Labelling, recoding, etc  Effect proportional scaling  Standardisation  Interaction terms 41

42 The GEODE model for GEMDE?  ….A service for MUGs and MIRs… oDefine/register ‘Minority Unit Groups’ oDefine/register ‘Minority Information Resources’ oExplore data resources and obtain help in approaching analysis of complex, sparse data

43

44 What's a MIR?  'Minority Information Resource'. oThis is our own terminology. By a MIR, we mean any piece of information which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' that we used on GEODE  E.g. summary statistical data about the categories from and documentation or information  E.g. recodings which have been used in a particular study oSocial scientists are not in general aware of the existence of MIRs (cf. wides use of popular Occupational Information Resources). In GEMDE we seek to publicise little know resources and promote their uptake: We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research.  In our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR).

45 The GEMDE portal ‘Liferay portal’ with access to MUGs and MIRs, first release Jan 2010, now available for general use (www.dames.org.uk/gemde)  Shibboleth access for registered users  Guest level access  Deposit MUGs/MIRs  Search/browse deposited resources  Feedback on resources (user ratings)  Review live data (e.g. pooled LFS records)  Expert and user quality ratings

46 Screenshot here! 46

47 Summary: Remind me how these topics enhance survey data..?  Variable operationalisations can ordinarily be improved by more ‘rigour and vigour’  More transparent operationalisation/documentation  Better use of detailed data  Better ability to include measures in suitably complex models/analysis  The GESDE approach has been to seek technological solutions to the organisation and distribution of complex variable-related information

48 48 Data used  Department for Education and Employment. (1997). Family and Working Lives Survey, 1994-1995 [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: 3704.  Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg.  Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666.  Office for National Statistics. Social and Vital Statistics Division and Northern Ireland Statistics and Research Agency. Central Survey Unit, Quarterly Labour Force Survey, January - March, 2008 [computer file]. 4th Edition. Colchester, Essex: UK Data Archive [distributor], March 2010. SN: 5851.  University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.

49 49 References  Altman, M., & Franklin, C. H. (2010). Managing Social Science Research Data. London: Chapman and Hall.  Bennett, T., Savage, M., Silva, E. B., Warde, A., Gayo-Cal, M., Wright, D., et al. (2009). Culture, Class, Distinction. London: Routledge.  Blanden, J., Goodman, A., Gregg, P., & Machin, S. (2004). Changes in generational mobility in Britain. In M. Corak (Ed.), Generational Income Mobility in North America and Europe (pp. 147-189). Cambridge: Cambridge University Press.  Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158.  Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171.  Ganzeboom, H. B. G. (2008). Tools for deriving status measures from ISKO-88 and ISCO-68. Retrieved 1 March, 2008, from http://home.fsw.vu.nl/~ganzeboom/PISA/  Guveli, A. (2006). New Social Classes within the Service Class in the Netherlands and Britain: Adjusting the EGP class schema for the technocrats and the social and cultural specialists. Nijmegen: Radbound U. Nijmegen.  Harkness, J., van de Vijver, F. J. R., & Mohler, P. P. (Eds.). (2003). Cross-Cultural Survey Methods. NY: Wiley.  Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European Working Book for Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers.  Jowell, R., Roberts, C., Fitzgerald, R., & Eva, G. (2007). Measuring Attitudes Cross-Nationally. London: Sage.  Kulas, J. T. (2008). SPSS Essentials: Managing and Analyzing Social Sciences Data New York: Jossey Bass.  Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk).www.dames.org.uk  Lambert, P. S., & Gayle, V. (2009). 'Escape from Poverty' and Occupations. Colchester, Essex: BHPS Research Conference, 9-11 July 2009, and www.iser.essex.ac.uk/events/conferences/bhps-2009-conference/overview  Lambert, P. S., & Penn, R. D. (2001). SOR models and Ethnicity data in LIS and LES : Country by Country Report. Syracuse University, Syracuse, New York 13244-1020: Luxembourg Income Study Paper No. 260.  Levesque, R., & SPSS Inc. (2010). Programming and Data Management for IBM SPSS Statistics 18: A Guide for PASW Statistics and SAS users. Chicago: SPSS Inc.  Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press.  Penn, R. D., & Lambert, P. S. (2009). Children of International Migrants in Europe: Comparative Perspectives. Basingstoke: Palgrave.  Prandy, K. (1979). Ethnic discrimination in employment and housing. Ethnic and Racial Studies, 2(1), 66-79.  Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES.  Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper 2006-05.  Rose, D., & Harrison, E. (Eds.). (2010). Social Class in Europe: An Introduction to the European Socio-economic Classification London: Routledge.  Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.  van Deth, J. W. (2003). Using Published Survey Data. In J. A. Harkness et. a.l. (2003) (pp. 329-346).

50 50 Appendix Existing resources – sources and types of support for data management in the social sciences:

51 NCRM, Session 27, 1 July 200851 Existing resources (i): Data providers a) Documentation and metadata files

52 52 Existing resources (i): D ata providers b)Resources for variables  CESSDA PPP on key variables http://www.nsd.uib.no/cessda/project/http://www.nsd.uib.no/cessda/project/  UK Question Bank http://surveynet.ac.uk/sqb/datacollection/resources.asphttp://surveynet.ac.uk/sqb/datacollection/resources.asp  ONS Harmonisation www.ons.gov.uk/about-statistics/harmonisation/www.ons.gov.uk/about-statistics/harmonisation/ c)Resources for datasets  UK Census data portal, http://census.ac.uk/http://census.ac.uk/  IPUMS international census data facilities, www.ipums.orgwww.ipums.org  European Social Survey, www.europeansocialsurvey.orgwww.europeansocialsurvey.org d)Data manipulations prior to data release  Missing data imputation / documentation  Survey design / weighting information  Influential – most analysts use ‘the archive version’

53 53 Existing resources (ii) Resource projects / infrastructures -UK ESDS www.esds.ac.ukwww.esds.ac.uk ESDS International| ESDS Government ESDS Longitudinal|ESDS Qualidata -Helpdesks; online instructions; user support.. -UK ESRC NCRM / NCeSS / RDI initiatives -Longitudinal data – www.longitudinal.stir.ac.ukwww.longitudinal.stir.ac.uk -Linking micro/macro - www.mimas.ac.uk/limmd/www.mimas.ac.uk/limmd/ -Other resources / projects / initiatives -EDACwowe - http://recwowe.vitamib.com/datacentrehttp://recwowe.vitamib.com/datacentre

54 54 Existing resources (iii) Analytical and software support  Textbooks featuring data management  [Levesque & SPSS Inc, 2010] [Altman & Franklin, 2010] [Long, 2009] [Kulas, 2008]  Software training covering DM  Stata’s ‘data management’ manual  SPSS user group course on syntax and data management, www.spssusers.co.uk www.spssusers.co.uk But generally, sustained marginalisation of DM as a topic  Advanced methods texts use simplistic data  Advanced software for analysis isn’t usually combined with extended DM requirements

55 55 Existing resources (iv) Data analysts’ contributions  Academic researchers often generate and publish their own DM resources, e.g. Harry Ganzeboom on education and occupations, http://home.fsw.vu.nl/~ganzeboom/pisa/ http://home.fsw.vu.nl/~ganzeboom/pisa/ Provision of whole or partial syntax programming examples  Analysts often drive wider resource provisions related to DM CAMSIS project on occupational scales, www.camsis.stir.ac.ukwww.camsis.stir.ac.uk CASMIN project on education and social class

56 56 Existing resources (v) Literatures on harmonisation and standardisation  National Statistics Institutes’ principles and practices E.g. ONS www.ons.gov.uk/about-statistics/harmonisation/www.ons.gov.uk/about-statistics/harmonisation/  Cross-national organisations E.g. UNSTATS - http://unstats.un.org/unsd/class/http://unstats.un.org/unsd/class/  Academic studies E.g. [Harkness et al 2003] [Hoffmeyer-Zlotnick & Wolf 2003] [Jowell et al. 2007] [Scheider, 2008] [Rose and Harrison 2010]


Download ppt "Dealing with variables: Resources and topics in enhancing secondary survey data Paul Lambert University of Stirling DAMES research Node, www.dames.org.ukwww.dames.org.uk."

Similar presentations


Ads by Google