Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3.

Similar presentations


Presentation on theme: "CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3."— Presentation transcript:

1 CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3 Ljubljana, 09.11.2009 Alex Agache

2 Question Database & Harmonisation Platform2 Cessda HARMonization of CATegories and Scales Markus Quandt (Team leader) Martin Friedrichs (R&D, programming) And CESSDA PPP - WP9 team (last slide) CHARM CATS  Current Status: Prototype/desktop  Future: Online workbench

3 Question Database & Harmonisation Platform3 Aims of this presentation 1.Functional requirements: CHARMCATS 2.Demands for source metadata (Portal & QDB) 3.Scenarios for feeding back enhanced metadata on comparability

4 Question Database & Harmonisation Platform4 1.Functional requirements: CHARMCATS Elements of the Metadata Model

5 Question Database & Harmonisation Platform5 Harmonisation: Basic Scenario A researcher wants to create comparative statistics for employment across European countries, year 2008 (Hypothetical) classification on employment (ex - post) Harmonisation: make data from different sources comparable

6 Question Database & Harmonisation Platform6 Targeted contributing users/research knowledge: Harmonisation How to proceed? Ouput = conversion syntax (e.g., SPSS, SAS) What coding decisions were made? Why these decisions were made? - Hydra  Experts in data issues  Experts in comparative measurement [+ Question(naire) development]  Experts in conceptual issues of measurements

7 Question Database & Harmonisation Platform7 Core of Functional Requirements Publishing (ex-post) harmonisation = metadata on 3 working Steps => Harmonisation Project

8 Question Database & Harmonisation Platform8 4 Unemployed 3 Self employed 2 Employed half time 1 Employed full time 7. Classification = Harmonized Variable 1. Concept = Define Employment 2. For what universes? A. Conceptual Step: What to measure/harmonize? B. Operational Step: How to measure/harmonize? 6. Reality = Country and Dataset specific Variables/Questions 4. Define an (universal) Typology of Employment C. Data Coding Step: How to find and recode data? 5. Ideally = Country specific Indicators/Questions- functionally equivalent 3. Dimensions ? Employment status Employment regulation Cross country/Time universal Core Elements of a Harmonisation Project

9 Question Database & Harmonisation Platform9 CHARMCATS: (3) Data-Coding Step

10 Question Database & Harmonisation Platform10 Summary: Metadata in CHARMCATS (1) Type of retrieved metadata:  HP Components: harmonized Classification, Scales, Indexes  Study components: Variables, Questions, Universes, etc.  Type of HP: depending on completness Functionality:  Ex-ante output & Ex-post harmonisation  Support creation of harmonisation routines  Support data users in undestanding datasets

11 Question Database & Harmonisation Platform11 Summary: Metadata in CHARMCATS (2) Standard Format:  Sources (expected): DDI2/3  CHARMCATS: DDI3 Location source:  CESSDA Portal  Question Database (QDB)  User‘s Studies in DDI2/3.xml

12 Question Database & Harmonisation Platform12 2. Demands for source metadata CESSDA Portal (Studies incl. Variables/Questions)  Studies: 3383  Variables: 160.644 (incl. doublettes)  Variables with Question Text: ca. 85%  Variables with Labels or Frequencis: ca. 95%

13 Question Database & Harmonisation Platform13 Required Input Elements for CHARMCATS Question and variables connected to concepts Metadata from comparative studies by design Identification of variables and questions measured ex-ante as part of harmonized measurement instruments within a study Context information attached to variables Elements linked/tagged via Thesaurus (ELSST) Contextual databases (aggregate level) Bias: conceptual, methodological (and data) Validity of specific source variables/questions (e.g., psychometric inf.; cog. interviews) Required but not necessary

14 Question Database & Harmonisation Platform14 Summary: Required QDB Metadata Literal question text + answer categories English translation Multiple questions (Q. batteries) Position in + link to original Questionnaire Study context Nice to have:  Concept tagging Methodological Information ‚Proven Standard‘ Scales/Questions (e.g., Life satisfaction, post- materialism)

15 Question Database & Harmonisation Platform15 Vision: CHARMCATS_QDB Services Search/online access Questions used in both applications -> question (questionnaire) development - > ex-ante harmonisation CHARMCATS: users offer information on comparability of questions QDB: supports comparability analysis QDB: similarity matching-> commonality wheights

16 Question Database & Harmonisation Platform16 3.Scenarios for feeding back enhanced metadata on comparability - Starting points for discussion -

17 Question Database & Harmonisation Platform17  First phase: Charmcats will ‚read‘ metadata from CESSDA holdings but not write back  Subsequent stages: write to other serves or expose for searches through standardized interfaces material into the CESSDA infrastructure What material?

18 Question Database & Harmonisation Platform18 Core metadata on comparability Groups of harmonized variables Harmonized variables in form of partial datasets Coding routines Functional equivalent questions-variables (Universe/Concepts-Dimensions) International Standard Classifications and Scales Degrees of comparability (charmcats) + ? Commonality weights (QDB)

19 Question Database & Harmonisation Platform19 Thought experiments: 1.Additional Metadata created in charmcats on quality of harmonized measures Proposal for group discussions (Tuesday) 2. Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB  Use Case ISCED-97: Working Steps  Re-use/Impact of inf. on data coding (meas. error) in data analysis  Additional Metadata = prior inf. in Bayesian analysis  DB on quality of measurements

20 Question Database & Harmonisation Platform20 Additional Information Web: www.cessda.orgwww.cessda.org and of PPP Docs: oBourmpos, Michael; Linardis, Tolis (with Alexandru Agache, Martin Friedrichs, and Markus Quandt) (2009, September): D9.2 Functional and Technical Specifications of 3CDB. oHoogerwerf, M. (2009): Evaluation of the WP9 QDB Tender Report. oKrejci, Jindrich; Orten, Hilde and Quandt, Markus (2008): Strategy for collecting conversion keys for the infrastructure for data harmonisation, http://www.cessda.org/ppp/wp09/wp09_T93report.pdf http://www.cessda.org/ppp/wp09/wp09_T93report.pdf oQuandt, M., Agache, A., & Friedrichs, M. (2009, June). How to make the unpublishable public. The approach of the CESSDA survey data harmonisation platform. Paper presented at the NCESS 5th International Conference on e-Social Science, 24th – 26th June 2009, Cologne. Accessible at: http://www.ncess.ac.uk/resources/content/papers/Quandt.pdf http://www.ncess.ac.uk/resources/content/papers/Quandt.pdf Forthcoming: oFriedrichs, M., Quandt, M., Agache, A. The case of CHARMCATS: Use of DDI3 for publishing harmonisation routines. 1st Annual European DDI Users Group Meeting: DDI - The Basis of Managing the Data Life Cycle, 4th December 2009.

21 Question Database & Harmonisation Platform21 WP 9, CESSDA-PPP Nanna Floor Clausen (DDA) Maarten Hoogerwerf (DANS) Annick Kieffer (Réseau Quetelet ) Jindrich Krejci (SDA) Laurent Lesnard (CDSP) Tolis Linardis (EKKE) Hilde Orten (NSD) http://www.cessda.org/project/index.html http://www.cessda.org/project/doc/wp09_descr2.pdf

22 Question Database & Harmonisation Platform22 Thought experiments: 1.Additional Metadata created in charmcats on quality of harmonized measures Proposal for group discussions (Tuesday) 2. Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB  Use Case ISCED-97: Working Steps  Re-use/Impact of inf. on data coding (meas. error) in data analysis  Additional Metadata = prior inf. in Bayesian analysis  DB on quality of measurements

23 Question Database & Harmonisation Platform23 Thought experiments: 1.Harmonisation Platoform – (Additional) Metadata - Quality of harmonized measures Group Discussion: Harmonisation/Comparable Data Use Case ISCED-97: Working Steps: a.Additional Metadata = measurement error  Re-useImpact of inf. on data coding (meas. error) in data analysis b. Additional Metadata = Priors in Bayesian analysis c. DB on quality of measurements

24 Question Database & Harmonisation Platform24 Example of harmonisation on education: ISCED-97 with ESS Round 3 data Scenario: Data ESS 03 (2006): 10 European country samples Same source variables: country specific education degrees Two variants of reclassification into ISCED-97: A.ESS team harmonized variable: EDULVL B.WP9.2 harmonized variable Other coding into ISCED of the same data (not considered here): Schneider, 2008

25 Question Database & Harmonisation Platform25 Classification on Education: ISCED - 97 1. Conceptual Step Concept of education: Broad definition Dimensions: Level of education, orientation of the educational program (general-vocational), position in the national degree structure Universe: Initially developed for OECD countries New variant of ISCED: 1997 Typology resulting in 7 classes of education: 0. Pre-primary education 1. Primary education 2. Lower secondary 3. Upper secondary 4. Intermediate level 5. Tertiary education 6. Advanced training Source: OECD (2004)

26 Question Database & Harmonisation Platform26 2. Operational Step Guidelines on measurement in survey research? (proposals year 2000<) Problems in coding: Codings for respondents with educational certificates received before 1997/data collection- little information on coding procedures The hydra not visible here: how does a specific educational certificates measure the multiple and interelated dimensions ISCED-97 Mapping (ISCED Manual):

27 Question Database & Harmonisation Platform27 3. Data Coding: Result of Mapping/Coding

28 Question Database & Harmonisation Platform28 Storing within a database the 2 harmonized variables Calculated „Agreement“ between two outputs of coding- same classification: 2 different harmonized variables Kappa= 0.67 Other measures for quality of coding/reliability? (e.g., ICCs) Ignore or consider when using one of the harmonized variables?  2 different target variables - same classification  Both target var use the same source var and same operational and conceptual

29 Question Database & Harmonisation Platform29 Next slides: Impact on data analysis – „Quality“ of coding (Reliability) 2 basic Status attainment models  Without measurement erorr  With measurement error Quality of coding: how to relate to Validity?

30 Question Database & Harmonisation Platform30 Harmonisation metadata: active reuse in data analysis ESS data 2006: Respondents aged 25-64 Model Specification: ISCED Error = 0 /Erro=.33 (1- Reliability) Household Income ISCED R‘s ISCED father Age Gender Household Income ISCED R‘s ISCED father Age Gender Norway Germany without error:.354 (.03) with error:.581 (.04) with error:.44 (.04) without error:.33 (.03) without error: 206.304 ( 29.08) with error: 288.22 (39.42) without error: 276.091 (28.387) with error: 414.83 (41.93) SEM notation: Covariances and residuals not shown (Unstandardized estimates)

31 Question Database & Harmonisation Platform31 Example: Bayesian SEM Analysis with ISCED ESS data, 2006: Norway, Repondents aged: 24-65 R‘s Education- > Income Mean = 286.966 Posterior p =.50 MCMC samples = 82.501 Bayesian approach Test of Hypothesis (probability of a hypothesis being true given the data) Use of priovous published/expert knowdledge in the field for specifying informative priors on specific parameters of a model Few but rising applications with cross-national data

32 Question Database & Harmonisation Platform32 DB Harmonisation Reliability: Aggregated across similar harmonizations/different country data sets New DB on Quality of compartive measurements DB: Quality of measurements Validity of harmonized/ latent variables DB Expert knowledge: Guidelines Comparability- measurement equivalence Analysis results Model Specification Priors on specific parameters

33 Question Database & Harmonisation Platform33 DB on Quality of measurements: User likeability Currently: - low incentives for researcher to publish new findings on validity of measurements in an ‚open access‘ database (before and after publications in journals) - mostly likelihood statistical methods employed Ioannidis (2005): Why most published research findings are false: „The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true” - New regulations for registering data: (e.g, International Standard Randomised Controlled Trial Number in U.K: www.isrctn.com) Future potential: - Self contributions from research groups (e.g., ESS, EVS, ISSP) - Meta-analysts (avoiding publication bias) - Contribution of bayesians

34 Question Database & Harmonisation Platform34 Questions? Thanks to: CESSDA, WP9 team ISCED-97 Coding: Annick Kiefer; Vanita Matta

35 Question Database & Harmonisation Platform35 Thought experiments: Proposal for group discussions (Tuesday) Interim solution for using DDI2/3 - via a linking Shell <- Charmcats/QDB

36 Question Database & Harmonisation Platform36 Thought experiment 2: DDI2/3 linking Shel Repository A Repository B Repository C CESSDA Holdings T-Shell DDI2 & DDI3 V 1 V 2 V 3 V 4 V 999 DDI2 only DDI3 CHARMCATS Application QDB Application Other App CESSDA Portal Use of V 1 for Harmonization purposes V1 V4V4 Registry Request for V 1 + V 4

37 Question Database & Harmonisation Platform37 Commonality weights (c.w.) Scenario: -Example C.W. = 0- 100 (weight for similarity or probability belonging to ad hoc comparability group x) -Search by different criteria (XXX) -Similarity matching algorithm provides c.w. (example: Lewensthein Algorithm) -Learning algorithm! -Bayesian prediction

38 Question Database & Harmonisation Platform38 Conclusions Contributors and Source data is requiered for intitial implementaiton -> QDB Any


Download ppt "CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3."

Similar presentations


Ads by Google