Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24 th June 2009 This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.ukwww.dames.org.uk

2 Some recent history – Atkinson (1996: 47)

3 Stewart et al. (2009: 5)

4 Todays workshop: Where next? Problems / challenges with cross-national survey analysis Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data The contribution of e-social science

5 Why is e-Science relevant? e-Science models cover distributed computing & enabling of collaborations [e.g. Foster et al., 2001] e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009] Cross-national survey projects include complex distributed data & a clear need for collaborations… Hitherto, cross-national survey projects have not generally made use of e-science initiatives

6 Part 1: What is e-Social Science doing for cross-national survey research? Projects on the research lifecycle data collection data management [DAMES] data analysis Projects on a national scale Projects on data, but not necessarily survey data [e.g. digital records; aggregate data; metadata]

7 The example of DAMES and GE*DE www.dames.org.uk www.dames.org.uk 1.1) Grid Enabled Specialist Data Environments (GE*DE) 2.1) Description, discovery & service use through metadata and data abstraction 1.2) Data resources for micro- simulation on social care data 2.2) Techniques to handle data from multiple sources 1.3) Linking e-Health and social science databases 2.3) Workflow modelling for social science 1.4) Training and interfaces for management of complex survey data 2.4) Security driven data management

8 Data management means… the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis […DAMES Node..] Usually performed by social scientists themselves Most overt in quantitative survey data analysis Preparing or enabling survey analysis Usually a substantial component of the work process But not explicitly rewarded (and sometimes penalised) Here we differentiate from archiving / controlling data itself Here we differentiate from archiving / controlling data itself

9 The significance of data management for social survey research (see http://www.esds.ac.uk/news/eventdetail.asp?id=2151) The data management studied across the DAMES Node is a major component of the social survey research workload Pre-release manipulations performed by distributors / archivists Coding measures into standard categories Dealing with missing records Post-release manipulations performed by researchers Re-coding measures into simple categories We do have existing tools, facilities and expert experience to help us…but we dont make a good job of using them efficiently or consistently So the significance of DM is about how much better research might be if we did things more effectively…

10 In GE*DE, were developing Services for accessing and depositing specialist data Occupations, educational qualifications, ethnicity UK Administrative data (with ADLS) Materials specifically oriented to comparative analytical approaches Data resources often from major cross-national studies Producing new cross-national data resources (see also talk on standardization of categorical data in session 4a)

11 GEODE v1: Organising and distributing specialist data resources (on occupations)

12 Cross-national data in DAMES and GE*DE 1.New specialist data on occupations, education and ethnicity a.Curation and re-release of existing data b.Generation of new data (and/or metadata), with focus on standardisation/ harmonisation 2.Conduit to existing resources 3.Generic resources for workflow documentation and replication

13 E.g. (1a) Occupations [cf. Leiulfsrud et al. 2005]

14 E.g. (1b) Ethnicity / Migration

15 E.g. (2): Occupations

16 E.g. (3): Workflow documentation

17 Part 2: The contribution of e-Science The contribution should concern: Navigating complex data Security Workflows Compare with current issues for cross-national surveys: Quantity of data (and metadata) Debates on harmonisation, equivalence, data quality Access to data

18 (a) Quantity of data (& metadata) …current trends micro-data Moving beyond macro-data analysis* to exploiting large-scale micro-data *Country level analysis, e.g. Fuchs (2009) secure Interest in / access to secure micro-data complex Exploitation of complex micro-data oLongitudinal data and the life-course [Mayer, 2005] oMicro-data and links with macro-data oMetadata about the quality of the micro-data

19 (a) … can be helped by… Interest in / access to secure micro-data E-Science projects building portals for secure access to data (e.g. Sinnott 2008) Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE) Metadata provision on data resources (e.g. PolicyGrid) Comparative standardisations (e.g. GE*DE) Tools for complex analysis (e.g. e-Stat) Tools for simulation (e.g. NeISS) Tools for visualisation of complex data (e.g. Maptube) Tools for workflow records for research lifecycle (cf. MyExperiment]

20 (b) Harmonisation, equivalence and data quality Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations E-Science resources support Documenting / replicating ex post harmonisations e.g. syntax databases at GE*DE Furnishing new scaling tools (meaning equivalence) e.g. scales of educational qualifications at GE*DE Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of alternative measures ?Pluralistic/open source vs quality control

21 More on GE*DE and issues of data quality GE*DE covers Occupations; Educational qualifications; Ethnicity and migration These are key variables in social science research Regularly measured Link to concepts of central interest Multivariate context (Critical relations with gender, age cohort, etc)

22 Key variables: concepts and measures VariableConceptMeasure (e.g.) Something useful OccupationClass; stratification; unemployment Occupation-based social classification www.geode.stir.ac.uk EducationCredentials; Ability; Merit Qualification based educational level www.equalsoc.org/8 [Schneider, 2008] Ethnic group Ethnicity; religion; race; national origins Minority ethnic group indicators [Bosveld et al 2006] AgeAge; life course stage; cohort Polynomial age function [Abbott 2006] GenderGender; household / family context www.genet.ac.uk IncomeIncome; wealth; poverty; Monthly income; income groups; … www.data-archive.ac.uk www.data-archive.ac.uk [SN 3909]

23 c) Access to data..need for Facilities for granting access to data Including new [potentially secure] data Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly] E-Social science contributions Security infrastructures (e.g. portal frameworks) offer much stronger models for secure access to data Services for organising / distributing metadata

24 The contribution of e-Science - reflections The contribution should concern: Navigating complex data Security Workflows But, generally, it isnt taken up (cf. existing networks, e.g. LIS, IPUMS, ESS, etc)

25 Possible explanations E-science tools and services too heavyweight compared to ad hoc sharing solutions Overheads in adopting e-Science tools (cf. existing working models) E-science tools are unduly generic (c.f. ongoing focussed projects and related resources) Working habits: Experts and software Major cross-national projects pre-date e-Science initiatives Key role of project-specific experts Many projects are small N and dont seem to require heavyweight inputs Survey researchers collaborate through proprietary software (e.g. Stata, SPSS)

26 Conclusions – will things change? Overheads of e-Science engagement might decline GE*DE aims: user friendly services, service delivery emphasis, training workshops, mainstream software Existing ad hoc practices could become insufficient Data of greater scale and complexity Data with security limits Need for integrated access and complex analysis Need for plurality in analyses of multiple measures (even in Small N comparisons) Need for documentation for replication

27 References cited Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press. Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press. Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222. Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58. Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4). Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0. Minneapolis: University of Minnesota. Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service and Applications. London: CRC Press. Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

Similar presentations

Presentation on theme: "Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,

Similar presentations

Presentation on theme: "Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,"— Presentation transcript:

Similar presentations

About project

Feedback