Presentation is loading. Please wait.

Presentation is loading. Please wait.

Putting DDI 3.0 to Work for You!

Similar presentations


Presentation on theme: "Putting DDI 3.0 to Work for You!"— Presentation transcript:

1 Putting DDI 3.0 to Work for You!
Sanda Ionescu, Documentation Specialist, ICPSR Mary Vardigan, DDI Alliance Director IASSIST Conference – Stanford University May 27, 2008

2 Today’s Schedule 9:00 – 9:15 Brief DDI History and Intro
9:15 – 9:30 Life Cycle – Early Stages 9:30 – 10:45 Life Cycle Exercise 10:45 – 11:00 Break 11:00 – 11:50 Life Cycle – Archive & Beyond 11:50 – 12:00 Questions and Answers

3 First Half of Morning We will be moving through the data life cycle of a real study and will document it as we go. We will use a tool to produce “markup” for seven life cycle stages. Sanda will guide us through the exercise and Mary will go step by step onscreen. End result is DDI documentation deposited into an archive.

4 Second Half Once our sample data and documentation are deposited, we review the changes made by the archive. Then we discuss DDI 3.0 in the archival context and why it makes sense to use it. Finally, assuming we have convinced you, we discuss how to move to DDI 3.0!

5 DDI History Effort began in 1995 when ICPSR convened a small international group at IASSIST in Quebec City. Standard began as SGML, then converted to Web-friendly XML. 2000 – DDI Version 1.0 published as a DTD, mainly document- and codebook-centric.

6 DDI History 2003 – DDI Version 2.0 published with extended scope including aggregate data coverage and geography. Versions 1.0 through 2.1 (latest published) are backwards compatible, and based on the same structure.

7 DDI History February 2003 – Formation of the DDI Alliance, a self-sustaining membership organization whose members have a voice in the development of the DDI specification.

8 DDI History Version 3.0: 2004-2006: Planning and Development
November 2006: Internal Review February 2007: Public Review July 2007: Candidate Draft Release April 2008: Proof of Concept and Vote April 28, 2008: Official Publication of DDI 3.0

9 DDI 3.0 Features Full implementation of XML Schemas
Emphasis on metadata reuse: Modular structure Use of schemes

10 DDI 3.0 Features Modular structure
Allows increased flexibility in using the specification. Main modules: Instance Study Unit Resource Package Group Conceptual Components Data Collection Logical Product Physical Data Product Physical Instance Archive Comparative

11 DDI 3.0 Features Use of Schemes
Facilitates reuse of information: Categories Codes NCubes Physical Structures Record Layouts Organizations Concepts Universes Geographic Locations Geographic Structures Questions Interviewer Instructions Variables

12 DDI 3.0 Features Machine-actionable Grouping and comparison features
Registries now possible Versioning clarified Multi-lingual support

13 DDI 3.0 Features Compatibility with other metadata standards:
MARC, DC, but also… SDMX (Statistical Data and Metadata Exchange) ISO (Metadata Registries) FGDC (Digital Geospatial Metadata) ISO (Geographic Information Metadata) PREMIS, METS – forthcoming… Life cycle orientation

14 Life Cycle Orientation
DDI 3.0 documents all stages in the life cycle of a data collection: pre-production production post-production secondary use new research effort

15 DDI 3.0 Use Cases Documenting an on-going, original research project.
Documenting secondary use of data. Creating concept/question/variable banks. Generating multiple delivery formats for data dissemination/discovery. Metadata mining for comparison, etc.

16 DDI 3.0 to Document an On-going Research Project
DDI 3.0 can be used to document a research project in “real time”, from its inception (study proposal, design) through data collection, processing, and initial data production.

17 $ € £ + + + + Research Staff Principal Investigator Collaborators
<DDI 3.0> Questions Instrument <DDI 3.0> Variables Physical Stores + <DDI 3.0> Purpose Concepts Universe Geography People/Orgs <DDI 3.0> Funding Revisions + + + <DDI 3.0> Data Collection Data Processing $ € £ Data Archive/ Repository Submitted Proposal Publication

18 DDI 3.0 to Document an On-going Research Project
Advantages: Richer, contextual information made available and preserved. Increased accuracy, as life cycle stages are documented “at the source”. No loss of information as study progresses through its life cycle. Changes in documentation preserved through versioning. Ultimately gives data analysts more information to understand and assess data quality.

19 DDI 3.0 to Document an On-going Research Project
Use case exercise: Academic environment. Faculty member/researcher initiates an original, independent research project. Small-scale effort. No use of computer-assisted interviewing software. Resulting data and documentation to be deposited to a data center/archive. Archive provides incentives and support for documenting all activities in DDI as they happen.

20 DDI 3.0 to Document an On-going Research Project
Incentives for entering documentation “at the source”: Information easy to enter: use of data entry tool “hides” complexities of xml code. Underlying DDI structure provides prompts and pre-organizes information. DDI may also serve as a management/diagnostic tool to assist in data processing and cleaning operations, or revising the documentation. Real-time entries and standardized content ensure high-quality documentation that facilitates primary data analysis and preparing reports.

21 DDI 3.0 to Document an On-going Research Project
Use case exercise: Based on a real study in the ICPSR archive (ICPSR study No. 9413, “Survey of Three Generations of Mexican Americans, ”) Study documentation is laid out sequentially according to the life cycle. Data entry tool provides a user-friendly interface and is projected to produce DDI 3.0 output; follows life cycle, but may also be used retrospectively.

22 Life Cycle Stages Study Proposal
WHO? (Principal Investigator) When? (November 1st, 1979) WHO? (Co-authors) Research Question(s) Hypotheses Population Geographic Area Provisional Title

23 Life Cycle Stages Study Proposal: Input

24 Life Cycle Stages Study Proposal: DDI 3.0 Output
WHO? (Principal Investigator) Archive: Individual Life Cycle Event: Responsibility Date When? WHO? (Co-authors) Study Unit: Creator (s) Title Purpose Universe Ref. Spatial Coverage (Provisional Title) Research Question(s) Hypotheses Population Geographic Area Conceptual Component: Universe Geographic Structure

25 Life Cycle Stages Study Funding
WHO? Funding Agency WHEN? (June 1st, 1980) Proposal Grant 5-R01-AG-01573

26 Life Cycle Stages Study Funding: Input

27 Life Cycle Stages Study Funding: DDI 3.0 Output
Archive: Organization WHO? Funding Agency Study Unit: Funding Agency Grant Number Life Cycle Event: Responsibility Date Proposal

28 Life Cycle Stages Defining Concepts
WHO? WHEN? (July 1st, 1980) Question/Concept Bank Research Questions (+) Study Concepts =

29 Life Cycle Stages Defining Concepts: Input

30 Life Cycle Stages Defining Concepts: DDI 3.0 Output
Life Cycle Event: Responsibility, Date… DDI Concept Scheme (Ref.) Question/Concept Bank Research Questions (+) Study Concepts =

31 Life Cycle Stages Questionnaire Design
WHO? WHEN? (July 25, 1980) Question/Concept Bank Study Concepts (+) Questions, Responses =

32 Life Cycle Stages Questionnaire Design: Input

33 Life Cycle Stages Questionnaire Design: DDI 3.0 Output
Life Cycle Event: Responsibility, Date… DDI Question Scheme (Ref.) Question/Concept Bank Study Concepts (+) Logical Product: Category Scheme(s) Code Schemes Questions, Responses =

34 Life Cycle Stages Questionnaire Translation
WHO? WHEN? (September 1st, 1980) Original Language Questions, Responses Translated Questions, Responses

35 Life Cycle Stages Questionnaire Translation: Input

36 Life Cycle Stages Questionnaire Translation: DDI 3.0 Output
Life Cycle Event: Responsibility, Date… DDI Question Scheme -Bilingual Version- Original Language Questions, Responses Logical Product: Category Scheme(s) -Bilingual Version- Translated Questions, Responses

37 Life Cycle Stages Data Collection
WHO? WHO? ( ) REPORT SAMPLE (October 15, 1980 – April 1st, 1981)

38 Life Cycle Stages Data Collection: Input

39 Life Cycle Stages Data Collection: DDI 3.0 Output
Life Cycle Events: Responsibility, Dates… Data Collection: Responsibility Date Sampling Mode Of Collection Note

40 Life Cycle Stages Data Production
WHO? WHEN? (1983) Q&A DATA

41 Life Cycle Stages Data Production: Input

42 Life Cycle Stages Data Production: DDI 3.0 Output
Life Cycle Event: Responsibility, Date… Data Collection: (Processing Operations) Logical Product: Variable Scheme Additional Code/Category Schemes [Missing Data] Physical Data Product: Record Structure* Variables’ Locations Q&A Physical Instance: (Processing Checks) Number of Cases Number of Records DATA

43 BREAK …

44 Life Cycle Stages Data Cleaning and Processing: DDI as diagnostic/management tool
The presence of standardized documentation facilitates data processing. DDI documentation can be used as a project “dashboard” to identify problems and keep track of operations. Queries can address: Data errors: missing values, out-of-range values (incorrect computation or recode logic), inconsistent or undocumented codes Missing documentation: question text, description Editing errors: missing labels, misspelled variable names

45 Life Cycle Stages Deposit to Archive
At the time of deposit, both the research process and the data are already documented in DDI… Advantages: The presence of standardized information facilitates archival processing, enabling procedure streamlining and automation. Richer, more accurate information made available for preservation, archival processing and dissemination: enhances data discovery and secondary analysis.

46 Life Cycle Stages Deposit to Archive
Richer, more accurate information. Examples: Original / working title preserved (may be found in early reports, published prior to any title changes). Author’s affiliation and position at the time of research. Responsible agencies and dates made available for all life cycle events. Parallel / associated research efforts and publications accurately documented.

47 Life Cycle Stages Deposit to Archive
Richer, more accurate information. Examples: Presence of concepts represents an important added value for data discovery, appraisal, and further analysis. Documented source of concepts and questions (original or re-used) is relevant for secondary, and particularly comparative analysis efforts. For bi- or multilingual studies, multiple language versions of descriptive elements are made available side-by-side, facilitating comparison, analysis and/or filtered specific language(s) retrieval.

48 Life Cycle Stages Deposit to Archive
Use of DDI throughout the study life cycle prevents loss of information. Preservation of successive versions allows early-bound information retrieval. To meet specific goals and needs, the archive may create its own version(s) of the documentation, but will also preserve the originally deposited version. The DDI format enables easy, automated navigation among all existing versions.

49 Life Cycle Stages Archival Processing: Data and Documentation
The archive becomes the maintaining agency and creates its own instance: The archive is described as organization, as owner/maintainer of collection, and specified as (new) publisher and/or distributor, with appropriate date(s). Original archive (depositor to present archive) referenced in the archive module. Reference may also be included to originally deposited DDI that is preserved and also made accessible.

50 Life Cycle Stages Archival Processing: Data and Documentation
The archive edits or adds information and populates new DDI fields to support archival operations: Edits title to conform to archive’s standards (ICPSR adds study date) Updates author’s affiliation according to current position, and adds/updates contact information (telephone, , current address, etc.) Adds subject headings and keywords to assist data discovery (searches at study level)

51 Life Cycle Stages Archival Processing: Data and Documentation
The archive edits or adds information: Adds study abstract, integrating purpose with description of data collection and the final data product. Adds structured methodological information, enabling more granular, targeted searches (e.g., temporal coverage, analysis unit(s) covered, kind of data, data source).

52 Life Cycle Stages Archival Processing: Data and Documentation
The archive documents any in-house, “post-production” processing as well as resulting changes in the data: New data file identification, to reflect archive location. Description of processing checks performed by archive. Description of added variables (archive-specific, indexes, recodes, etc.) if appropriate. Variable- and category-level statistics may be calculated and added to the DDI documentation to enhance variables description.

53 Life Cycle Stages Archival Processing: Data and Documentation
The archive adds an itemized description of the entire distribution package associated with a study, including archival-specific information like availability, access conditions/restrictions, and collection completeness, as well as item-level identification, URI, format, medium, etc.

54 Integrating DDI 3 into Archives
What is in it for us? Standardized study descriptions provide for integration and consistency between collection catalog and documentation products. Standardized documentation supports automated generation of multiple delivery formats, including PDF and HTML.

55 Integrating DDI 3 into Archives
What is in it for us? DDI 3 enables the creation of an expanded scientific record covering the full life cycle, including instrument documentation. DDI 3 supports streamlining and increased automation of archival operations. DDI 3 instances can carry data inline. DDI 3 has improved functionality for complex/hierarchical files.

56 Integrating DDI 3 into Archives
Improved functionality for complex/hierarchical files. Example:

57 Integrating DDI 3 into Archives
What is in it for us? DDI 3 facilitates grouping and comparison from the highest level to the lowest: Mechanism to organize series information, showing only what changes over time. Variable harmonization and comparison.

58 Integrating DDI 3 into Archives
What is in it for us? Modular structure and use of schemes allow creation of meta-resources, offering additional functionality: Question/concept/variable banks Geography databases Organizations/Individuals registries

59 Integrating DDI 3 into Archives
What is in it for us? Concept/question/variable banks: Metadata reuse Cross-study variable/question/concept searches and analyses Cross-study comparisons Track questions/variables over time Register an organization’s official measures

60 Integrating DDI 3 into Archives Concept/question/variable banks
….

61 Integrating DDI 3 into Archives Concept/question/variable banks
….

62 Integrating DDI 3 into Archives Concept/question/variable banks
….

63 Integrating DDI 3 into Archives
Geography databases /registries: Automatically match locations with appropriate geographic level Keep track of historical changes Information always accurate and up-to-date Facilitate data entry

64 Integrating DDI 3 into Archives
Organizations/Individuals registries: Keep track of historical changes (names, affiliations, contact information, etc.) Information always accurate and up-to-date Facilitate data entry

65 Integrating DDI 3 into Archives
What is in it for us? Preservation: Life cycle orientation of documentation means that a “chain of custody” is provided to meet preservation requirements. Archives can use the life cycle events to track data processing activities (data transformation). The structure of DDI 3.0 integrates well with FEDORA (Flexible Extensible Digital Object Repository Architecture) – a digital repository management system used by many archives. Separate instances can be created to follow the OAIS model: SIP, AIP, DIP.

66 Integrating DDI 3 into Archives
Information sharing: Use of DDI 3 facilitates information sharing and collaborative projects among archives: Example: SRO-ICPSR “Data Documentation and Dissemination” project implements a common, DDI 3.0 compliant, database model to allow a smooth data transfer between the two organizations.

67 Integrating DDI 3 into Archives SRO-ICPSR collaboration project
SAS/SPSS/Stata files DDI 3.0 Blaise output Other… DDI 2.x Common RELATIONAL DATABASE model for data documentation - Compliant with DDI 3.0 - Client Applications… Web Applications… ICPSR: Variable-level Search ICPSR projects will be able to use documentation generated by SRO projects…

68 Archives: Moving the collection to DDI 3.0
Catalog records: Archive standard -> map to DDI 3.0 Dublin Core -> map to DDI 3.0 DDI 2.x -> map to DDI 3.0 Conversion by simple programming script or XSLT.

69 Archives: Moving the collection to DDI 3.0
Catalog record conversions Examples: ICPSR -> DDI 2.1 -> DDI 3.0 Dublin Core -> DDI 2.1 -> DDI 3.0 ICPSR Stylesheet: DDI 2.1 -> DDI 3.0

70 Archives: Moving the collection to DDI 3.0
Legacy studies: Tools: “Stats” to DDI 3.0 DDI 3.0 editor XML editor DDI 2.x “codebooks”: DDI 2.x to DDI 3.0 converter (may be stylesheet, or simple script, based on DDI 2.x to 3.0 mapping)

71 Resources: DDI 3.0 Proof of Concept - Use Cases and Implementations:
DDI Tools: Workshop materials:

72 Contact Information Sanda Ionescu: Mary Vardigan: Matthew Richardson: DDI users’ list:

73 Questions?

74 The End.


Download ppt "Putting DDI 3.0 to Work for You!"

Similar presentations


Ads by Google