Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
BI Web Intelligence 4.0. Business Challenges Incorrect decisions based on inadequate data Lack of Ad hoc reporting and analysis Delayed decisions.
1 SDMX Reference Infrastructure (SDMX-RI) Work in progress, status and plans Bengt-Åke Lindblad, Adam Wroński Eurostat Eurostat Unit B3 – IT and standards.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Need for SOA database for storing SOA data Divya Gade Rejitha Rajasekhar.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
NESSTAR Limitedw w w. n e s s t a r. c o m DDI-Publishing Made Easy- the Nesstar Way Jostein Ryssevik Nesstar Ltd.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
An Architecture for Creating Collaborative Semantically Capable Scientific Data Sharing Infrastructures Anuj R. Jaiswal, C. Lee Giles, Prasenjit Mitra,
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
January 2013 CDMI: An Introduction. Big Data Complexity Volume Speed “Big Data” refers to datasets whose size is beyond the ability of typical tools to.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Managing Data Interoperability with FME Tony Kent Applications Engineer IMGS.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau William C. Block, 1 Warren Brown, 1 Jeremy Williams, 1 Lars Vilhuber,
The Complicated Provenance of American Community Survey Data: How Far will PROV and DDI Take Us? William C. Block, 1 Warren Brown, 1 Jeremy Williams, 1.
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
1 Copyright © 2004, Oracle. All rights reserved. Introduction to Oracle Forms Developer and Oracle Forms Services.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
 Chapter 6 Architecture 1. What is Architecture?  Overall Structure of system  First Stage in Design process 2.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
XML Registries Source: Java TM API for XML Registries Specification.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
The RSNA Teaching File System (MIRC) John Perry.  MIRC Overview – Teaching Files  RSNA Clinical Trial and Research Software  Hands On: Using the RSNA.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Introduction to The Storage Resource.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
U.S. Environmental Protection Agency Central Data Exchange Pilot Project Promoting Geospatial Data Exchange Between EPA and State Partners. April 25, 2007.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd.
Renovation of Eurostat dissemination chain
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
The Storage Resource Broker and.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Introduction to Oracle Forms Developer and Oracle Forms Services
Introduction to Oracle Forms Developer and Oracle Forms Services
Introduction to Oracle Forms Developer and Oracle Forms Services
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Connecting Researchers with Data: Discovery, Documentation, Access and Security Cornell Institute for Social and Economic Research (CISER); German Institute.
The Re3gistry software and the INSPIRE Registry
Integrating PROV with DDI: Mechanisms of Data Discovery within the U.S. Census Bureau William C. Block,1 Warren Brown,1 Jeremy Williams,1 Lars Vilhuber,2.
Connecting Researchers with Data: Discovery, Documentation, Access and Security Cornell Institute for Social and Economic Research (CISER); German Institute.
PREMIS Tools and Services
HingX Project Overview
Updates on the XSLT stylesheets for DDI
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
LOD reference architecture
Metadata The metadata contains
Semantic Statistics DDI Lifecycle: Moving Forward Outcome of the Recent Workshops in Dagstuhl Joachim Wackerow.
Presentation transcript:

Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze Warren Brown John Abowd

Outline Overview of Cornell NSF-Census (NCRN) Node Comprehensive Extensive Data Documentation and Access Repository (CED 2 AR) Extract, Transform and Load (ETL) Application Programming Interface (API) User Interface (UI) Coming Features of CED 2 AR The Problem of Provenance Future NCRN Work

Summary of NCRN Problem Inadequate curation of secure datasets Inconsistent or non-existent identification Need for selective hiding of data and metadata Scientific Method demands solution “Those inside can’t see out, and those outside can’t see in!”

NCRN DDI Solution at the Variable Level:

Variable Level Solution (continued)

Small tweak to the DDI Codebook Schema would fix this. No DDI Solution at the Level of a Value Label

Schematic of NCRN System

The challenge of ingesting disparate data Format disparity Schema disparity Sparseness of metadata

ETL Modular Approach – Building to Reuse

Component View of System

Application Programming Interface (API) For our purposes, REST is a set of software architecture principles that leverage the intrinsic architecture of the Web. Five Key Principles: Give everything (resource) an ID Link things together Uses standard methods Represent things (resources) in multiple ways Communicate so that the client and server are independent of one another The benefits of REST

Application Programming Interface (API) From this: {baseurl}/rest?query=let%20$ced2ar%20:=%20collection('CED2AR')%20for%20$var% 20in%20$ced2ar/codeBook/dataDscr/var%20where%20%20starts-with(lower- To this: {baseurl}/search?return=variables&where=variablename=a* Motivation – Simplicity for greater utility

Application Programming Interface (API) Return A chosen set of fields within the DDI schema Where A chosen set of supported DDI fields to filter your query by And, or, ‘and not’ Contains, starts-with, ends-with Sort Descending, ascending Limit (i.e.: give me results from each codebook) The API makes interacting with the repository easier because it abstracts away the underlying Xquery necessary to perform the query. The API currently supports the following query parts:

User Interface – Simple search

User Interface – Advanced Search

User Interface – Dataset-specific view

User Interface – Search results

Future Features Backend Additional data sources Generic metadata ingest mechanism from finite set of statistical applications API OAI-PMH compliance Support for more of the DDI schema Web Application More robust filtering (i.e.: by variable group, date, etc..,) Variable groups added to search results Export search results System Wide Features to support the storage, exploration and discover of the provenance of variables, datasets, etc..,

The Problem of Provenance

Controlled Provenance Vocabulary; RDF Triples Dataset Variable Coding Scheme Dataset Variable Graph structure lends itself well to managing and querying provenance. Developing a controlled vocabulary to describe these relationships, then storing the data as RDF triples is one possible approach.

Future NCRN Work Deploy to a Census Research Data Center Automate where possible Provide process by which metadata can be enhanced Existing tools Home grown programs Provenance solution Embed known relationships within metadata Algorithmically determine similarity (dataset level; variable level)