DDI 3 Comparison Test-Case at ICPSR

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Archiving Trevor Croft MICS3 Data Archiving, Dissemination and Further Analysis Workshop Geneva - November 6th, 2006.
Template: Making Effective Presentation about Your Evidence-based Health Promotion Program This template is intended for you to adapt to your own program.
DDI for the Uninitiated ACCOLEDS /DLI Training: December 2003 Ernie Boyko Statistics Canada Chuck Humphrey University of Alberta.
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
Configuration management
Modelling with databases. Database management systems (DBMS) Modelling with databases Coaching modelling with databases Advantages and limitations of.
Chapter 3 – Web Design Tables & Page Layout
Edouard Manet: The Bar at the Folies Bergere, 1882
Metadata at ICPSR Sanda Ionescu, ICPSR.
CHARMCATS: Harmonisation demands for source metadata and output management CESSDA Expert Seminar: Towards the CESSDA- ERIC common Metadata Model and DDI3.
PantherSoft Financials Smart Internal Billing. Agenda  Benefits  Security and User Roles  Definitions  Workflow  Defining/Modifying Items  Creating.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Object-Oriented Analysis and Design
1 Adaptive Management Portal April
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
IASSIST Conference 2006 – Ann Arbor, May Metadata as report and support A case for distinguishing expected from fielded metadata Reto Hadorn S I.
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Codebook Centric to Life-Cycle Centric In the beginning….
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Chapter 9 Database Design
Chapter 14: Advanced Topics: DBMS, SQL, and ASP.NET
9 1 Chapter 9 Database Design Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
UML Class Diagrams: Basic Concepts. Objects –The purpose of class modeling is to describe objects. –An object is a concept, abstraction or thing that.
© 2014 by the Regents of the University of Michigan Metadata from Blaise and DDI 3.0/3.2 Gina Cheung Beth-Ellen Pennell North American DDI Conference April.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Nursing Care Makes A Difference The Application of Omaha Documentation System on Clients with Mental Illness.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
Week 5: Business Processes and Process Modeling MIS 2101: Management Information Systems.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.
Concepts and Terminology Introduction to Database.
Metadata Portal Project: Using DDI to Enhance Data Access and Dissemination Mary Vardigan Assistant Director, ICPSR Director, DDI Alliance.
Institute for Social Research University of Michigan
12 Developing a Web Site Section 12.1 Discuss the functions of a Web site Compare and contrast style sheets Apply cascading style sheets (CSS) to a Web.
Chapter 7 File I/O 1. File, Record & Field 2 The file is just a chunk of disk space set aside for data and given a name. The computer has no idea what.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Evaluation Proposal Defense Observations and Suggestions Yibeltal Kiflie August 2009.
QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.
August 2005 TMCOps TMC Operator Requirements and Position Descriptions Phase 2 Interactive Tool Project Presentation.
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Developing a Framework In Support of a Community of Practice in ABI Jason Newberry, Research Director Tanya Darisi, Senior Researcher
Software Quality Assurance and Testing Fazal Rehman Shamil.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
DBS201: Data Modeling. Agenda Data Modeling Types of Models Entity Relationship Model.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Expanding the Notion of Links DeRose, S.J. Expanding the Notion of Links. In Proceedings of Hypertext ‘89 (Nov. 5-8, Pittsburgh, PA). ACM, New York, 1989,
Research Proposal Writing Resource Person : Furqan-ul-haq Siddiqui Lecture on; Wednesday, May 13, 2015 Quetta Campus.
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
Product Training Program
Week 10: Object Modeling (1)Use Case Model
UML Class Diagrams: Basic Concepts
Chapter 2 Database Environment.
ICPSR Tools for the Metadata Portal
Enhancing ICPSR metadata with DDI-Lifecycle
Analysis models and design models
SDMX Information Model: An Introduction
EDDI12 – Bergen, Norway Toni Sissala
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
The role of metadata in census data dissemination
The Role of Metadata in Census Data Dissemination
Presentation transcript:

DDI 3 Comparison Test-Case at ICPSR Sanda Ionescu Documentation Specialist ICPSR

DDI 3 - Comparison “Research Questions” How can we use DDI 3 to document comparability, and support data harmonization projects? Explore use of Comparative module (information coverage, functionality) Compare use of Comparative module and use of inheritance through grouping: are both methods as effective in capturing necessary information? Can we build a tool to assist in documenting comparability and data harmonization in DDI 3? What would such a tool look like?

DDI 3 Comparison test-case Background DDI 3 markup was applied to the “Adult Demographics” variables of three nationally representative surveys on mental health, integrated in the Collaborative Psychiatric Epidemiology Surveys (CPES): The National Comorbidity Survey Replication (NCS-R) The National Latino and Asian American Study (NLAAS) The National Survey of American Life (NSAL) http://www.icpsr.umich.edu/CPES/ The National Comorbidity Survey Replication (NCS-R), the National Survey of American Life (NSAL), and the National Latino and Asian American Study (NLAAS).

DDI 3 Comparison test-case Background CPES studies : Conducted individually but with comparison in mind. May be analyzed independently. NOT longitudinal design (all collected 2001-2003) Comparison intended across populations, or subpopulations, of the USA: NCSR – US national probability sample NLAAS – target populations: Latino and Asian-American NSAL – target populations: African-American and Afro-Carribean Comparability could be documented using either group and inheritance, or the comparative module. White control groups for NSAL and NLAAS

DDI 3 Comparison test-case Background Choosing between use of Group/Inheritance or Comparison module Comparison by design vs. post-hoc comparison: sometimes not a clear-cut distinction, suggesting possibility of using either method (?) Important to know what are the practical implications of using either method – advantages, disadvantages, issues related to applying markup and/or processing: test by documenting the same example in both ways.

DDI 3 Comparison test-case Background A typical harmonization process workflow was outlined based on an ongoing ICPSR project seeking to produce a harmonized dataset of ten U.S. family and fertility surveys, belonging to three different, but related, series of longitudinal data: Growth of American Families, 1955 and 1960 National Fertility Survey, 1965 and 1970 National Survey of Family Growth, Cycles I-VI (1973, 1976, 1982, 1988, 1995, and 2002) (Integrated Fertility Survey Series – IFSS: http://www.icpsr.umich.edu/IFSS/) Integrated Fertility Survey Series; studies: -Growth of American Families, 1955 and 1960 -National Fertility Survey, 1965 and 1970 -National Survey of Family Growth, Cycles I-VI (1973, 1976, 1982, 1988, 1995, and 2002)

DDI 3 Comparison test-case Harmonization procedure: Datasets are searched (by keyword or concept, if available). Potentially comparable variables are selected. Complete variable descriptions are extracted from existing documentation: Variable name (and label) Question text / textual description of variable Physical representation (values, value labels, etc.) Universe Question context (preceding questions)

DDI 3 Comparison test-case Harmonization procedure (continued): Similarities/differences in listed elements are examined. A harmonized variable is projected based on the findings in the step above (there are no fixed rules, this is done on a case-by-case basis). A decision is made regarding the action on the component variables (recode, or simply add). Statistical software commands are generated and applied to data to create new harmonized dataset.

DDI 3 Comparison test-case Harmonized dataset is documented. New variables description includes: Information about source variables. Information about aggregation procedure (recodes, etc.) Information about similarities and differences in source variables compared with the harmonized one (usually in the form of a note).

DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? When a harmonized dataset is being produced, documenting pairwise comparisons between source variables in DDI as an intermediary step (pre-harmonization) appears to be superfluous: It does not assist in the decision-making process, which takes a more holistic approach, assessing candidate variables as a group It would involve an expense of time and effort that would not be justified by its limited/transitory utility (since the harmonized variable would capture the comparability among sources anyway)

DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? When a harmonized dataset is being produced, there is greater benefit in using the comparison module to document similarities and differences between the harmonized variable and each of its sources (post-harmonization) : This kind of documentation is required by harmonization best-practices anyway Information about the comparability among source variables may also be recreated by parsing their pairwise comparison with the harmonized one.

DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? Post-harmonization: DDI 3 Documentation Individual studies Search Display Examine Harmonize data Document harmonized dataset and source comparison in DDI 3 Discover Analyze Display Disseminate

DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure? If a harmonized dataset is NOT being produced, then it is useful to document the comparability of “original” variables to assist data users in analysis. NO harmonization: DDI 3 Documentation Individual studies Search Display Examine Document comparability in DDI 3 Discover Analyze Display Disseminate

DDI 3 Comparison test-case How can a tool assist in documenting comparability in DDI 3 ? (Projected) Tool: Searches DDI documentation of individual studies with full variable descriptions Allows narrowing down results to customized selection Provides same page display of selected variables’ descriptions (ideally complete with concept and universe statements) Search results are saved, and may be retrieved, to facilitate variables evaluation, decisions about harmonizing them, and ultimately help develop a translation table Steps above available in ICPSR SSVD – Internal Search OR THE TOOL ITSELF COULD ENABLE DEVELOPING A TRANSLATION TABLE.

DDI 3 Comparison test-case (Projected) Tool: Example customized selection Example translation tables generated for harmonization saved on USB

DDI 3 Comparison test-case Potential/Projected Tool: On the selected search results list, allows further pairwise selection and display of variables with full descriptions Interactive feature allows user to flag as similar or different the elements in the variables descriptions Based on the information entered in the step above, DDI 3 Comparison module is created. Elements flagged as similar or different are listed in the <Correspondence><Commonality> or <Correspondence><Difference> fields The <CommonalityTypeCoded> element may be filled in an automated way based on the information entered above (all common=“identical”; some different=“some”; use of “none”?) Create spreadsheet to mock this tool

DDI 3 Comparison test-case Use of the Comparison Module The Comparison Module: Structure Maps: Concepts, Variables, Questions, Categories, Codes, Universes. MAP: SourceSchemeReference (M) TargetSchemeReference (M) Correspondence (M) ItemMap: SourceItem (M) TargetItem (M) Correspondence: Commonality (M) Difference (M) CommonalityTypeCoded (O, NR) CommonalityWeight (O,NR) UserDefinedCorrespProperty (O,R)

DDI 3 Comparison test-case Used by ICPSR in CPES markup example: Commonality Difference Are mandatory. If the list of elements is structured and used consistently, may become machine-actionable, eliminating the need for the User Defined Correspondence (Should we enable an optional CV to allow interoperability? -Such a list would only apply to one type of map – variables, in our case) CommonalityTypeCoded with the proposed CV: Identical Some None

DDI 3 Comparison test-case HTML view of Variable Map in DDI 3 Comparison Module

DDI 3 Comparison test-case Using XSLT to (re)create the variables cross-walk from the pairwise comparisons: If we compare sources with a harmonized variable, the latter will always be the “target”. A -> H B -> H C -> H In this case the crosswalk will be relatively easy to create.

DDI 3 Comparison test-case Using XSLT to (re)create the variables cross-walk from the pairwise comparisons: If we compare individual variables for analysis purposes, creating a cross-walk can become very difficult/labor intensive: A->B B->C A->C A->D B->D C->D There is nothing in the discrete pairs to indicate their relationship; parsing done by multiple iterations results in duplications that need to be cleaned up; “source” and “target” denotations become irrelevant, but give the relationship a directionality which makes it more difficult to process

DDI 3 Comparison test-case Recreating the variables cross-walk from the pairwise comparisons: Same structure used for handling two different types of comparison (pre-harmonized and post-harmonized) Do we need a different model / structure for comparing “original” (individual) variables ? Or some additional element that would provide a key for the pairs needing to be linked? Explore possibility to use of ItemMap@alias? Use a different solution than XSLT to create cross-walk? (more sophisticated programming may be needed to capture complex relationships)

DDI 3 Comparison test-case Use of Comparison Module: Questions/Comments We normally include items (i.e., variables in our case) that have some degree of comparability. “None” would not be routinely used. Use of CommonalityWeight is optional: a scale of weights would have to be defined UserDefinedCorrespondenceProperty may replace CommonalityTypeCoded in user-specific cases Map structure identical (except for codes) but items compared are organically different : not all elements are relevant in all maps. (For variables we find it necessary to list similar and different components of their description, but for universes, or questions, etc., comparison would be at a more conceptual level)

DDI 3 Comparison test-case Use of Comparison Module: Questions/Comments Comparing non-harmonized variables: Is there a rationale for documenting comparability between their components as well (in addition to flagging them as similar or different)? The Comparison module does not provide links between items included in different maps, and the same item (question, universe, code scheme) may be used by multiple variables that are part of different mappings The complete variable descriptions may be pulled from the Logical Product

DDI 3 Comparison test-case Use of Comparison Module: Questions/Comments Comparing harmonized variables with their sources: The GenerationInstruction sequence in Code Map allows referencing source variable(s) and may document the recodes performed to harmonize it. This sequence mirrors the Coding:GenerationInstruction section in the Data Collection module. Coding is Identifiable (may be referenced by the resulting variable), GenerationInstruction is not Identifiable (cannot be referenced).

DDI 3 Comparison test-case Use of Comparison Module: Questions/Comments Documentation of comparability is “dissociated” from individual variables descriptions Could group+inheritance be a more effective way to capture both variable descriptions and their comparability, while at the same time allowing a complete description of individual datasets, including variables that have no comparable counterparts? Test by documenting the same data in both ways – when V3.1 is published, to allow identification of variable Name (in some instances, the only element that changes)