Presentation on theme: "DDI 3 Comparison Test-Case at ICPSR"— Presentation transcript:
1 DDI 3 Comparison Test-Case at ICPSR Sanda IonescuDocumentation SpecialistICPSR
2 DDI 3 - Comparison “Research Questions” How can we use DDI 3 to document comparability, and support data harmonization projects?Explore use of Comparative module (information coverage, functionality)Compare use of Comparative module and use of inheritance through grouping: are both methods as effective in capturing necessary information?Can we build a tool to assist in documenting comparability and data harmonization in DDI 3? What would such a tool look like?
3 DDI 3 Comparison test-case Background DDI 3 markup was applied to the “Adult Demographics” variables of three nationally representative surveys on mental health, integrated in the Collaborative Psychiatric Epidemiology Surveys (CPES):The National Comorbidity Survey Replication (NCS-R)The National Latino and Asian American Study (NLAAS)The National Survey of American Life (NSAL)The National Comorbidity Survey Replication (NCS-R), the National Survey of American Life (NSAL), and the National Latino and Asian American Study (NLAAS).
4 DDI 3 Comparison test-case Background CPES studies :Conducted individually but with comparison in mind.May be analyzed independently.NOT longitudinal design (all collected )Comparison intended across populations, or subpopulations, of the USA:NCSR – US national probability sampleNLAAS – target populations: Latino and Asian-AmericanNSAL – target populations: African-American and Afro-CarribeanComparability could be documented using either group and inheritance, or the comparative module.White control groups for NSAL and NLAAS
5 DDI 3 Comparison test-case Background Choosing between use of Group/Inheritance or Comparison moduleComparison by design vs. post-hoc comparison: sometimes not a clear-cut distinction, suggesting possibility of using either method (?)Important to know what are the practical implications of using either method – advantages, disadvantages, issues related to applying markup and/or processing: test by documenting the same example in both ways.
6 DDI 3 Comparison test-case Background A typical harmonization process workflow was outlined based on an ongoing ICPSR project seeking to produce a harmonized dataset of ten U.S. family and fertility surveys, belonging to three different, but related, series of longitudinal data:Growth of American Families, 1955 and 1960National Fertility Survey, 1965 and 1970National Survey of Family Growth, Cycles I-VI (1973, 1976, 1982, 1988, 1995, and 2002)(Integrated Fertility Survey Series – IFSS:Integrated Fertility Survey Series; studies:-Growth of American Families, 1955 and 1960-National Fertility Survey, 1965 and 1970-National Survey of Family Growth, Cycles I-VI (1973, 1976, 1982, 1988, 1995, and 2002)
7 DDI 3 Comparison test-case Harmonization procedure:Datasets are searched (by keyword or concept, if available).Potentially comparable variables are selected.Complete variable descriptions are extracted from existing documentation:Variable name (and label)Question text / textual description of variablePhysical representation (values, value labels, etc.)UniverseQuestion context (preceding questions)
8 DDI 3 Comparison test-case Harmonization procedure (continued):Similarities/differences in listed elements are examined.A harmonized variable is projected based on the findings in the step above (there are no fixed rules, this is done on a case-by-case basis).A decision is made regarding the action on the component variables (recode, or simply add).Statistical software commands are generated and applied to data to create new harmonized dataset.
9 DDI 3 Comparison test-case Harmonized dataset is documented.New variables description includes:Information about source variables.Information about aggregation procedure (recodes, etc.)Information about similarities and differences in source variables compared with the harmonized one (usually in the form of a note).
10 DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure?When a harmonized dataset is being produced, documenting pairwise comparisons between source variables in DDI as an intermediary step (pre-harmonization) appears to be superfluous:It does not assist in the decision-making process, which takes a more holistic approach, assessing candidate variables as a groupIt would involve an expense of time and effort that would not be justified by its limited/transitory utility (since the harmonized variable would capture the comparability among sources anyway)
11 DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure?When a harmonized dataset is being produced, there is greater benefit in using the comparison module to document similarities and differences between the harmonized variable and each of its sources (post-harmonization) :This kind of documentation is required by harmonization best-practices anywayInformation about the comparability among source variables may also be recreated by parsing their pairwise comparison with the harmonized one.
12 DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure?Post-harmonization:DDI 3Documentation Individual studiesSearchDisplayExamineHarmonize dataDocument harmonized dataset and source comparison in DDI 3DiscoverAnalyzeDisplayDisseminate
13 DDI 3 Comparison test-case How does DDI 3 fit in the harmonization procedure?If a harmonized dataset is NOT being produced, then it is useful to document the comparability of “original” variables to assist data users in analysis.NO harmonization:DDI 3Documentation Individual studiesSearchDisplayExamineDocument comparability in DDI 3DiscoverAnalyzeDisplayDisseminate
14 DDI 3 Comparison test-case How can a tool assist in documenting comparability in DDI 3 ?(Projected) Tool:Searches DDI documentation of individual studies with full variable descriptionsAllows narrowing down results to customized selectionProvides same page display of selected variables’ descriptions (ideally complete with concept and universe statements)Search results are saved, and may be retrieved, to facilitate variables evaluation, decisions about harmonizing them, and ultimately help develop a translation tableSteps above available in ICPSR SSVD – Internal SearchOR THE TOOL ITSELF COULD ENABLE DEVELOPING A TRANSLATION TABLE.
15 DDI 3 Comparison test-case (Projected) Tool: Example customized selection Example translation tables generated for harmonization saved on USB
16 DDI 3 Comparison test-case Potential/Projected Tool:On the selected search results list, allows further pairwise selection and display of variables with full descriptionsInteractive feature allows user to flag as similar or different the elements in the variables descriptionsBased on the information entered in the step above,DDI 3 Comparison module is created.Elements flagged as similar or different are listed in the <Correspondence><Commonality> or <Correspondence><Difference> fieldsThe <CommonalityTypeCoded> element may be filled in an automated way based on the information entered above (all common=“identical”; some different=“some”; use of “none”?)Create spreadsheet to mock this tool
17 DDI 3 Comparison test-case Use of the Comparison Module The Comparison Module: StructureMaps: Concepts, Variables, Questions, Categories, Codes, Universes.MAP: SourceSchemeReference (M)TargetSchemeReference (M)Correspondence (M)ItemMap: SourceItem (M)TargetItem (M)Correspondence: Commonality (M)Difference (M)CommonalityTypeCoded (O, NR)CommonalityWeight (O,NR)UserDefinedCorrespProperty (O,R)
18 DDI 3 Comparison test-case Used by ICPSR in CPES markup example:CommonalityDifferenceAre mandatory.If the list of elements is structured and used consistently, may become machine-actionable, eliminating the need for the User Defined Correspondence (Should we enable an optional CV to allow interoperability? -Such a list would only apply to one type of map – variables, in our case)CommonalityTypeCoded with the proposed CV:IdenticalSomeNone
19 DDI 3 Comparison test-case HTML view of Variable Map in DDI 3 Comparison Module
20 DDI 3 Comparison test-case Using XSLT to (re)create the variables cross-walk from the pairwise comparisons:If we compare sources with a harmonized variable, the latter will always be the “target”.A -> HB -> HC -> HIn this case the crosswalk will be relatively easy to create.
21 DDI 3 Comparison test-case Using XSLT to (re)create the variables cross-walk from the pairwise comparisons:If we compare individual variables for analysis purposes, creating a cross-walk can become very difficult/labor intensive:A->BB->CA->CA->DB->DC->DThere is nothing in the discrete pairs to indicate their relationship; parsing done by multiple iterations results in duplications that need to be cleaned up; “source” and “target” denotations become irrelevant, but give the relationship a directionality which makes it more difficult to process
22 DDI 3 Comparison test-case Recreating the variables cross-walk from the pairwise comparisons:Same structure used for handling two different types of comparison (pre-harmonized and post-harmonized)Do we need a different model / structure for comparing “original” (individual) variables ?Or some additional element that would provide a key for the pairs needing to be linked? Explore possibility to use ofUse a different solution than XSLT to create cross-walk? (more sophisticated programming may be needed to capture complex relationships)
23 DDI 3 Comparison test-case Use of Comparison Module: Questions/CommentsWe normally include items (i.e., variables in our case) that have some degree of comparability. “None” would not be routinely used.Use of CommonalityWeight is optional: a scale of weights would have to be definedUserDefinedCorrespondenceProperty may replace CommonalityTypeCoded in user-specific casesMap structure identical (except for codes) but items compared are organically different : not all elements are relevant in all maps. (For variables we find it necessary to list similar and different components of their description, but for universes, or questions, etc., comparison would be at a more conceptual level)
24 DDI 3 Comparison test-case Use of Comparison Module: Questions/CommentsComparing non-harmonized variables:Is there a rationale for documenting comparability between their components as well (in addition to flagging them as similar or different)?The Comparison module does not provide links between items included in different maps, and the same item (question, universe, code scheme) may be used by multiple variables that are part of different mappingsThe complete variable descriptions may be pulled from the Logical Product
25 DDI 3 Comparison test-case Use of Comparison Module: Questions/CommentsComparing harmonized variables with their sources:The GenerationInstruction sequence in Code Map allows referencing source variable(s) and may document the recodes performed to harmonize it.This sequence mirrors the Coding:GenerationInstruction section in the Data Collection module.Coding is Identifiable (may be referenced by the resulting variable), GenerationInstruction is not Identifiable (cannot be referenced).
26 DDI 3 Comparison test-case Use of Comparison Module: Questions/CommentsDocumentation of comparability is “dissociated” from individual variables descriptionsCould group+inheritance be a more effective way to capture both variable descriptions and their comparability, while at the same time allowing a complete description of individual datasets, including variables that have no comparable counterparts?Test by documenting the same data in both ways – when V3.1 is published, to allow identification of variable Name (in some instances, the only element that changes)