Standardisation, Harmonisation and Measurement Paul Lambert, 24-25 August 2009 Talk to the Data Management for Social Survey Research training workshop,

Slides:



Advertisements
Similar presentations
Comparability of categorical variables in longitudinal survey research
Advertisements

EU Presidency Conference Effective policies for the development of competencies of youth in Europe Warsaw, November 2011 Improving basic skills in.
Educational Consultant
Moral Character and Character Education
Anne Gilleran BECTA Research Conference London 13 June 2003 The Digital Generation Student Voices from the eWatch Study BECTA Research Conference 13th.
The 1 st National Quality Assurance Conference November 17-18, 2011 Student Feedback on Teaching and Learning at The University of the West Indies, St.
Requirements Engineering Process
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Caucasus Research Resource Centers (CRRC)-Armenia Migration and Remittances: Data from CRRC DI Surveys Yerevan April 29, 2008
MEASURING LABOUR FORCE PARTICIPATION OF WOMEN
1 ESTIMATION IN THE PRESENCE OF TAX DATA IN BUSINESS SURVEYS David Haziza, Gordon Kuromi and Joana Bérubé Université de Montréal & Statistics Canada ICESIII.
Improved Questionnaire Design Yields Better Data: Experiences from the UKs Annual Survey of Hours and Earnings Jacqui Jones, Pete Brodie, Sarah Williams.
UNITED NATIONS Shipment Details Report – January 2006.
1 The SEP Gradient, Race, or the SEP Gradient and Race: Understanding Disparities in Child Health and Functioning Lisa Dubay, PhD, ScM The Urban Institute.
1 WTO Statistics Division Trends in Services Trade under GATS Recent Developments Symposium on Assessment of Trade in Services World.
What is valorisation ? Growth €
1 Validation & Measurement Methods for the PHARE Demonstrations R A Whitaker Validation Project Leader.
One Sky for Europe EUROCONTROL © 2002 European Organisation for the Safety of Air Navigation (EUROCONTROL) Page 1 FAA/Eurocontrol Technical Interchange.
Statistical Significance and Population Controls Presented to the New Jersey SDC Annual Network Meeting June 6, 2007 Tony Tersine, U.S. Census Bureau.
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
Linking the DAMES & e-Stat Nodes Paul Lambert, 26 Feb 2010, Bristol, e-Stat review meeting DAMES is the Data Management through e-Social Science research.
Multiple Sequence Analysis: a contextualized narrative approach to longitudinal data University of Stirling, September 2007 Gary Pollock Department of.
1 Individual continuities, social mobility and cumulative inequalities along the life course The example of Germany Steffen Hillmert University of Tübingen.
Chapter 12 Analysing quantitative data
Chapter 3 Critically reviewing the literature
Chapter 5 Formulating the research design
Manipulating data: Deriving variables, handling missing data, and cleaning data - practices, services and standards Paul Lambert (Dept. Applied Social.
Karen Dennison Accessing international survey data collections via ESDS British Academy, Tuesday 14 March 2006 ESDS International.
An Introduction to the UK Data Archive and the Economic and Social Data Service November 2007 Jack Kneeshaw, UKDA.
For the e-Stat meeting of 27 Sept 2010 Paul Lambert / DAMES Node inputs.
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
Outline of talk The ONS surveys Why should we weight?
SADC Course in Statistics Population Projections - II (Session 20)
Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research,
DAMES - Data Management through e-Social Science 1 DAMES: Data Management through e-Social Science NCeSS Research Node University of Stirling / University.
Dealing with data on ethnicity: Principles and practice Paul Lambert, University of Stirling Talk presented to the DAMES Node workshop on Data on ethnicity.
DAMES, 31/JAN/2012, T6 Opportunities and prospects in social research Paul Lambert, 31 st January 2012 Talk to the seminar Data management in the social.
Projects in Computing and Information Systems A Student’s Guide
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
Longitudinal Workforce Analysis using Routinely Collected Data: Challenges and Possibilities Shereen Hussein, BSc MSc PhD Kings College London.
Dr. Engr. Sami ur Rahman Data Analysis Lecture 6: SPSS.
1 Understanding Multiyear Estimates from the American Community Survey.
Configuration management
Fact-finding Techniques Transparencies
DOROTHY Design Of customeR dRiven shOes and multi-siTe factorY Product and Production Configuration Method (PPCM) ICE 2009 IMS Workshops Dorothy Parallel.
1 Quality Indicators for Device Demonstrations April 21, 2009 Lisa Kosh Diana Carl.
Real Estate Market Analysis
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
Maps, tables, flow charts and diagrams in Qualitative Data Analysis
1 ESDS Government: added value for large-scale government datasets Vanessa Higgins, Economic and Social Data Service CCSR, University of Manchester MOF.
Labour Force Historical Review Sandra Keys, University of Waterloo DLI OntarioTraining University of Guelph, Guelph, ON April 12, 2006.
© 2012 National Heart Foundation of Australia. Slide 2.
Statistical Analysis SC504/HS927 Spring Term 2008
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter Review: What is Sociology?
Determining How Costs Behave
Systems Analysis and Design in a Changing World, Fifth Edition
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
1 Interpreting a Model in which the slopes are allowed to differ across groups Suppose Y is regressed on X1, Dummy1 (an indicator variable for group membership),
PSSA Preparation.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
From Model-based to Model-driven Design of User Interfaces.
Tools of data analysis Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 2 on.
Occupational data Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on ‘Dealing.
Presentation transcript:

Standardisation, Harmonisation and Measurement Paul Lambert, August 2009 Talk to the Data Management for Social Survey Research training workshop, part of the Data Management through e-Social Science research Node of the National Centre for e-Social Science /

2 Standardisation, Harmonisation and Measurement 1)The idea of measurement 2)Data management and categorical data 3)Standardizing categorical data 4)Supporting the standardization of categorical data Adapted from: Lambert, P. S., Gayle, V., Bowes, A. M., Blum, J. M., Jones, S. B., Sinnott, R. O., et al. (2009). Standards setting when standardizing categorical data. Cologne, June 2009: Paper presented to the Fifth International Conference on Social Science Methodology, organised by GESIS and the National Centre for e-Social Science, and

3 Ideas about measurement Survey analysis involves scanning across cases for relations between variables Identification of variable effects relies on structured empirical differences between cases It doesnt follow that how a measure was defined corresponds to that empirical identification oExample: Age and educational qualifications It is desirable to keep an open mind over the interpretation of an empirical pattern (explore more options and test more variations)

4

5 Example from occupational research Broad concordance of schemes Measures mostly measure the same thing Generalised concepts are better Criterion validity is asymmetric [cf. Tahlin 2007] Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the International Sociological Association, Research Committee 28 on Social Stratification and Mobility, Montreal (14-17 August).

6 2) Data management and categorical data categorical data = values in a quantitative dataset where the numeric data represents membership of groups (categories) but has no direct arithmetic meaning A Qualitative type of quantitative data [metric=data is arithmetic] Ordinal/nominal forms, & statistics [Stevens, 1946; Agresti, 2002] Interest in this talkHighMediumLowN.A.(ordinal) Country of residenceUKGermanyOtherN.A.(nominal)

7 Categorical data is important.. Principal social survey datum oBasis of most social research reports/analyses/comparisons Its rich and complex oWere often interested in very fine levels of detail / difference oWe usually recode categories in some way for analysis …how categorical data is managed is of great consequence to the results of analysis… Choices about recoding, boundaries, contrasts made [e.g. RAE analysis: Lambert & Gayle 2009]

8 UK EFFNATIS survey (1999) [Heckmann et al 2001]

9 EFFNATIS sample (1999): Subjective ethnic identity

10

11 Family and Working Lives Survey (54 vars per educ record)

12

13 3) Standardizing categorical data Standardization refers to treating variables for the purposes of analysis, in order to aid comparison between variables o{In the terminology of survey research analysts} 1. Arithmetic standardization to re-scale metric values [z i = (x i – x) / sd] 2. Ex-ante harmonisation (during data production) [ensuring measures of the same concept, collected from different contexts, are recorded in coordinated taxonomies] 3. Ex-post harmonisation [adapting measures of the same concept, collected from different contexts, using a coordinated re-coding procedure]

14 The big issue: standardization for comparisons Comparisons are the essence [Treiman, 2009: 382] to make statements about differences [in measures] over contexts Categorical data is highly problematic.. Cant immediately conduct arithmetic standardization Struggle to enforce harmonised data collection..which may not in any case be suitable.. Struggle to achieve ex-post harmonisation Non-linear relations between categories Shifting underlying distributions

15 Two conventional ways to make comparisons [e.g. van Deth 2003] Measurement equivalence = ex ante harmonisation (or ex post harmonisation) Meaning equivalence = Arithmetic standardisation (or ex ante or ex post harmonisation) Much comparative research flounders on an insufficient recognition of strategies for equivalence (One size doesnt fit all, so we cant go on)

16 Measurement equivalence (i) Measurement equivalence by assertion

17 (ii) Measurement equivalence example: Lissification Major research programme in ex-post harmonisation of Labour Force Surveys over time and between countries

18 (iii) Measurement equivalence and social class Show tabplot here

19 Meaning equivalence For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence (because of non-linear relations between categories and shifting underlying distributions) (even if measurement equivalence seems possible) Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context For categorical data, this can be achieved by scaling categories in one or more dimension of difference

20

21 Effect proportional scaling using parents occupational advantage

22

23 A comment on offsets and meaning equivalence - for comparisons between regressions, it is sometimes suitable to force the coefficients of some variables (e.g. controls) to have a certain fixed value - Below example (predicting income) using cnsreg in Stata, e.g.: regress lninc fem age femage matrix define mod1m=e(b) scalar fem_coef=mod1m[1,1] constraint def 1 fem=fem_coef cnsreg lninc fem age femage mcamsis, constraints(1)

24 What we do and what we ought to do (when standardizing categories) Research applications tend to select a favoured categorisation of a concept and stick with it Due to coordinated instructions [e.g. Blossfeld et al. 2006] Due to perceived lack of available alternatives Due to perceived convenience To make statistical analyses more robust we should… Operationalise and deploy various scalings and arithmetic measures Try out various of categorisations and explore their distributional properties … and keep a replicable trail of all these activities..

25 4) Supporting the standardization of categorical data GE*DE projects are concerned with allowing social science researchers to navigate, and exploit, heterogeneous information resources Occupational Information Resources Educational Information Resources Ethnic minority/Migration Information Resources We are finding that one of the most useful contributions is in helping with the standardization of categorical data

26 What makes this e-Social Science? Standards setting Metadata Portal framework Liferay portal to various DAMES resources iRODS system for GE*DE specialist data Controlled data access under security limits Use of workflows

27 E.g. of GEODE v1: Organising and distributing specialist data resources (on occupations)

28 (i) Basic access to data Services to.. search for and identify suitable information resources {Liferay portal and iRODS file connection} allow merging these resources with own data {Non-trivial consideration – complex micro-data subject to security constraints} Constructing new standardized resources for UK and major cross-national surveys E.g. Effect proportional scales for ethnic groups and educational qualifications across countries and over time CAMSIS scales for educational homophily (cf.

29 (ii) Depositing data Services to… Allow researchers to deposit specialist information resources to be immediately visible to others Collect basic metadata via proforma, option of adding extended metadata (DDI structure) {Motivations are altruism; citations; reduced burdens} {Quality control through site rankings, expert inputs}

30 (iii) Workflows for recodes and standardisations Documenting and distributing recodes / variable transformations / file matching operations Ready access to previously used standardizations (avoid re-inventing the wheel) Stata and SPSS focus (principal integrated data management / data analysis software for target users) {includes files as resources; & generate syntax log file}

31 Conclusions and considerations DAMES services are work in progress – Technical issues Service delivery / Quality control Scientific contributions Progress in standardisations and ideas of equivalence Suitable use of categorical data in social science data analysis! Documentation for replication Meta-analysis orientation

32 Data used Department for Education and Employment. (1997). Family and Working Lives Survey, [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg. Inglehart, R. (2000). World Values Surveys and European Values Surveys , , [Computer file] (Vol. 2000). Ann Arbor, MI: Institute for Social Research [Producer]; Inter-university Consortium for Political and Social Research [Distributor]. Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.

33 References Agresti, A. (2002). Categorical Data Analysis, 2nd Edition. New York: Wiley. Blossfeld, H. P., Mills, M., & Bernardi, F. (Eds.). (2006). Globalization, Uncertainty and Men's Careers: An International Comparison. Cheltenham: Edward Elgar. Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise Stirling: University of Stirling, Technical paper of the Data Management through e-Social Science research Node ( Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass. van Deth, J. W. (2003). Using Published Survey Data. In J. A. Harkness, F. J. R. van de Vijver & P. P. Mohler (Eds.), Cross-Cultural Survey Methods (pp ). New York: Wiley.