An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd.

Slides:



Advertisements
Similar presentations
ICPSR-SRO Shared Data Model Project Mary Vardigan Director, DDI Alliance.
Advertisements

EQUINOX DATA DELIVERY SYSTEM May 31, 2011 –Elizabeth Hill Equinox.uwo.ca.
Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
CS 431 The Semester in Elevator Speak Carl Lagoze – Cornell University May 5, 2004.
Developments in Data Discovery at ICPSR George Alter Director, ICPSR University of Michigan.
Commercial search engine developers and universities: a critical time for collaboration in the coming age of publicly accessible research data Stefan Kramer.
SOA and Web Services. SOA Architecture Explaination Transport protocols - communicate between a service and a requester. Messaging layer - enables the.
Implementation of the DDI at the Roper Center A Pilot Project on Resource Integration Marc Maynard and Hui Wang The Roper Center.
Learning and Teaching with the UK Census Developing the Collection of Historical and Contemporary Census Data and Materials into a Major Learning and Teaching.
INFO 4470/ILRLE 4470 Social and Economic Data Populations and Frames John M. Abowd and Lars Vilhuber February 7, 2011.
Are Public Use (Micro) Data a Thing of the Past? John M. Abowd Cornell University US Census Bureau Prepared for IASSIST 2002.
Archiving our Social Science Digital History ECURE 2005 March 1, 2005.
© John M. Abowd 2005, all rights reserved Introduction John M. Abowd January 2005.
3 rd Data without Boundaries Training Course EU‐SILC longitudinal component Paris, February 2014.
2 nd Data without Boundaries Training Course Bucharest, February 2013.
Integrated European Census Microdata 5 th DwB Training, Barcelona, January 2015.
EIA : “Automated Understanding of Captured Experience” Georgia Institute of Technology, College of Computing Investigators: Irfan Essa, G. Abowd,
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Representing and utilizing DDI in relational databases A new DDI best practices working paper Ingo Barkow, Senior researcher, Leibniz Institute for Educational.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
1 The planned use of DDI 3.0 within a German Research Data Center IASSIST, Session “Tools and Implementations of DDI 3.0”, May 27, 2009 Dana Müller.
Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau William C. Block, 1 Warren Brown, 1 Jeremy Williams, 1 Lars Vilhuber,
William Block, Co-PI Warren Brown & Stefan Kramer, Senior Scientists Florio Arguillas & Jeremy Williams, Project Staff Cornell Institute for Social and.
The Complicated Provenance of American Community Survey Data: How Far will PROV and DDI Take Us? William C. Block, 1 Warren Brown, 1 Jeremy Williams, 1.
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Research Data Centre network for transnational access - four years of experiences by seven European RDCs Karen Dennison (UK Data Archive) and David Schiller.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
OpenURL Link Resolvers 101
Leveraging the DDI Model for Linked Statistical Data in the Social, Behavioural, and Economic Sciences DC Thomas Bosch GESIS – Leibniz.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team Multi-Graph.
Workshop 1.4: ESPON Database ESPON Internal Seminar November 2011 Kraków,Poland ESPON M4D Project - LIG (Grenoble Computer Science Lab) Partner Jérôme.
Background Cornell Institute for Social and Economic Research (CISER): Data and Computing Support for Social and Economic Researchers at Cornell University.
INFO 4470/ILRLE 4470 Ethical Aspects of Data Collection and Privacy Protection John M. Abowd and Lars Vilhuber March 30, 2011.
ADP SUPPORT IN UGANDA BUILDING A NATIONAL DATA ARCHIVE Presented by Kizito Kasozi Director Information Technology Uganda Bureau of Statistics PARIS21.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
The NSF-Census Research Network (NCRN) Spring 2014 Meeting Introduction by John Thompson Director, Census Bureau.
The Data Documentation Initiative (DDI) Fostering Community Engagement and Adoption Breakout 9 RDA Sixth Plenary, Paris Mary Vardigan, ICPSR, University.
CS5604: Final Presentation ProjOpenDSA: Log Support Victoria Suwardiman Anand Swaminathan Shiyi Wei Department of Computer Science, Virginia Tech December.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
1 Crowdsourcing Metadata – Challenges and Outlook Lars Vilhuber, William Block (Cornell University) Washington, 10 May 2016.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
International Planetary Data Alliance Registry Project Update September 16, 2011.
NIST Office of Data and Informatics (ODI) of the Material Measurement Laboratory Robert Hanisch, director Ray Plante, interoperability expert ODI has responsibility.
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
Open Exeter Project Team
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
DataNet Collaboration
and Scholarly Communication
UNC Digital Library Project
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Connecting Researchers with Data: Discovery, Documentation, Access and Security Cornell Institute for Social and Economic Research (CISER); German Institute.
Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau William C. Block,1 Warren Brown,1 Jeremy Williams,1 Lars Vilhuber,2.
What comes first? Metadata or Data Access?
Connecting Researchers with Data: Discovery, Documentation, Access and Security Cornell Institute for Social and Economic Research (CISER); German Institute.
Christy Shorey Southern Miss
DDI-RDF Discovery Vocabulary _ Use Cases and Vocabularies
Updates on the XSLT stylesheets for DDI
The MRC Research Data Gateway
Question Banks, Reusability, and DDI 3.2 (Use Parameters)
Item 2.2 of the Agenda Remote access to confidential data for researchers: possible actions under the 7th Framework Programme Pascal JACQUES Unit B 5 15.
The Next Generation of the Microdata Information System MISSY: An Integrated Solution for the Documentation of European Microdata European DDI User Conference,
Introducing the Data Documentation Initiative
Presentation transcript:

An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd and Lars Vilhuber, 2 and Carl Lagoze 3 1 Cornell Institute Social and Economic Research, Cornell University 2 Labor Dynamics Institute, Cornell University 3 School of Information, University of Michigan Presentation at the 4 th Annual European DDI User Conference (EDDI12) Norwegian Social Science Data Services, Bergen, Norway 3 December, 2012

Outline The Problem: Curation of Data Locked within a Secure Environment is Difficult NCRN Solution: CED 2 AR Prototype CED 2 AR Search API DDI bridging the boundary between confidential and public metadata Questions and Discussion 2

Curating Data Locked within a Secure Environment is difficult By definition: Access is Restricted Lack of Curation throws up a barrier to Future Discovery and Access Replication of Results becomes increasing difficult Important! The Scientific Method depends on the ability to replicate the results of research 3

Research Opportunities at the Cornell Census Research Data Center 4

The RDC Network 5

6 We see this problem at Cornell: Research with Restricted Data increasing at CISER

7 Source: Raj Chetty, Increasing Use of Restricted Data in Research

8 Source: Raj Chetty, Use of Public Use Data Declining

9 Proposed Solution: Cornell’s NCRN Node Improved documentation and discoverability of both public and restricted data from the federal statistical system CED 2 AR DDI Solution to Confidential Metadata

CED 2 AR Overview and Goals Collect and standardize disparate metadata into a single DDI repository Provide a web interface for researchers to access Build an API for developers to use Use open standards Provide thorough documentation 10

Acknowledging CS 5150 Contributions Jeremy Williams* Benjamin Perry Justin Burden Chantelle Farmer Shudan Zheng Jessica Kane *CISER and NCRN staff member, EDDI co-author, and coordinator of the CS5150 team 11

CED 2 AR Search API The API Supports all of these query functions: Return o a chosen set of fields within the DDI schema Where o a chosen set of supported DDI search fields o and, or, and not o contains, starts-with, ends-with Sort o descending, ascending Limit o give me results from each codebook The API makes interacting with the repository easier because it abstracts away the underlying XQUERY necessary to perform the query. 12

CED 2 AR Search API Some Example DDI things (resources): Codebooks: o books books Codebook Named SSB o books/SSB books/SSB Variables of Codebook Named SSB o books/SSB/variables books/SSB/variables A particular variable in the SSB Codebook named totfam_kids o books/SSB/variables/totfam_kids books/SSB/variables/totfam_kids 13

CED 2 AR Search API Ability to create complex queries across codebooks Give me all variables across all codebooks where the variable text contains the word 'house' and the variable label contains the word 'dwelling' but does not start with the word 'number' (and sort it backwards by variable name) o =variabletext=*house*,variablelabel=*dwelling*,variablelabel!=number*&sort=- variablename =variabletext=*house*,variablelabel=*dwelling*,variablelabel!=number*&sort=- variablename 14

NCRN DDI Solution at the Variable Level: 15

Variable Level Solution (continued) 16

No DDI Solution at the level of a Value Label Small tweak to the DDI Codebook Schema would fix this. 17

Takk! Spørsmål? ncrn.cornell.edu ncrn.cornell.edu 18