Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview of Data-PASS Shared Catalog

Similar presentations


Presentation on theme: "An Overview of Data-PASS Shared Catalog"— Presentation transcript:

1 An Overview of Data-PASS Shared Catalog
Micah Altman, Harvard University

2 Contents Collaboration components What the Shared Catalog Does
How it Works Syndicated Storage Integration

3 Collaboration Components
Partnership Agreement Agreement to establish good practice Preservation copies of data collected Transfer Protocol: support in case of archival failure Operations Central database of leads for acquisition Development of shared procedures Review of acquisitions Documentation of procedure Identification & selection Metadata Security Confidentiality Catalog Discovery Layered Services An Overview of Data-PASS Shared Catalog

4 Search Across Entire Partners’ Catalogs
Finding Data Search Across Entire Partners’ Catalogs Find Studies Collected for Datapass Simple and Fielded Search Browse by Subject, Date, Source An Overview of Data-PASS Shared Catalog

5 What does it look Like? An Overview of Data-PASS Shared Catalog

6 Viewing Study Information
Study Information: Author, Title, Abstract, Citation,… Permanent Citation (optional) Provenance: Data Author, Producer, Distributor Chain of Responsibility for Metadata Files: Link to that study at partner site List of files (optional) Extended cataloging information Full Catalog Record DDI Record Variable-level information, descriptive stats (optional) An Overview of Data-PASS Shared Catalog

7 Delivering Data Through Partners’ Sites Through Shared Catalog
Shared catalog results always give link to data at partners site If no file information supplied to catalog, this is the only option Through Shared Catalog Catalog server may cache a copy of data for performance Catalog can bundle requests for multiple files Through Analysis Services If partner site runs VDC (or data access proxy), analysis and extraction is available Download data in multiple formats Extract subsets, in multiple formats, with citations and UNF’s Run descriptive stats, crosstabs Advanced analysis -- dozens of statistical models An Overview of Data-PASS Shared Catalog

8 Current Participation
Studies Listed Files Listed Files Available Analysis Available HMDC/ Murray Archives All From Catalog Yes ICPSR From Archive If hosting DataVerse software locally NARA Planned Planned: Selected files made available through The DataWeb ROPER ODUM An Overview of Data-PASS Shared Catalog

9 Advanced Analysis DataVerse network and Zelig make it easy to provide access to any statistical model available in the R statistical language Specification-driven architecture: describe the model to run the model Currently provides configuration for >25 models: regression, limited dependent variables, factor analysis, event counts, duration models An Overview of Data-PASS Shared Catalog

10 Supporting Technologies
Metadata: OAI+DDI+XSL Citation, validation: Handle+UNF Workflow, repository, analysis: Dataverse Network (VDC) An Overview of Data-PASS Shared Catalog

11 Each partner catalog is exposed via
Metadata Harvesting Each partner catalog is exposed via Dataverse Network (VDC) via OAI Other OAI Server, running on-site Proxy OAI Server, running at HMDC Harvested ad-hoc XSL Metadata to cross-walk applied Made available through OAI DDI-lite schema subset used for exchange Data Documentation Initiative (DDI) – international effort to establish specification schema for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences Provenance, and structural metadata, including: document description (meta-meta data), study description, file description, variable description An Overview of Data-PASS Shared Catalog

12 UNF – Universal Numeric Fingerprints
Same UNF regardless of hardware, operating system, statistical software, database, or spreadsheet software. UNF’s combine: generalized rounding (dessication), normalization (canonicalization), fingerprinting (cryptographic hash, e.g. SHA256) Available as: C++, R-stat language, Stata, SAS, S-Plus See: An Overview of Data-PASS Shared Catalog

13 Technologies Dataverse Network (<http://thedata.org>)
Includes integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on firmer ground. It facilitates the public preservation and distribution of persistent, authorized, and verifiable research data, with powerful but easy-to-use technology, whether or not the data are in the public domain. The project increases scholarly recognition (including formal scholarly citations to articles and to data sets) and distributed control for authors, journals, and others who make data available, facilitates data access and analysis by the scholarly community, and still enables professional archives to seemlessly provide extensive preservation and other services. Shared Catalog Support Provides ingest and dissemination framework and web-based GUI Provides OAI server Provides Data Services Broker, which supports identification, format conversion An Overview of Data-PASS Shared Catalog

14 An Overview of Data-PASS Shared Catalog

15 Architectural Overview
Data Mirror Metadata Catalog Harvester Online Catalog Online Analysis View Information on Data Through Catalog Link to Data at Partner Site Access Data With Extraction and Analysis, Through Catalog Direct to Partner Sites <XSL> Crosswalk proxy Search Shared Catalog OAI An Overview of Data-PASS Shared Catalog

16 Metadata described in detail on Data-PASS Site
Metadata Standards Study level: title, author, abstract, id, usage info, … [Required] Files: What is it? -- Description, URI [Required for backups] Is it valid? UNF (universal numeric fingerprint), MD5, … [Improves backup reliability] Variables: Description, ID, Location, … [Enables on-line analysis] Metadata described in detail on Data-PASS Site An Overview of Data-PASS Shared Catalog

17 Distributed Preservation - Prototype
Study Level Include usage metadata Include identifying tags Include Scanned Usage Agreement as File Files Provide URI’s in metadata Allow access by catalog harvester For more reliability, include MD5 or UNF’s Copies Current version of tagged studies will be mirrored at HMDC Resources may also be cached for speed An Overview of Data-PASS Shared Catalog

18 Distributed Backup Research – Potential Research
Schemas to express inter-archival preservation commitments Asymmetric mirroring to match distribution of holdings across partners Preservation of versioned resources Syndicated Storage Technology Integration LOCKSS SRB/IRODS Distributed Data Manager An Overview of Data-PASS Shared Catalog

19 More Information Shared Catalog:
Dataverse Network Software: Data Citations, UNF’s: Metadata and Other Partnership Documentation An Overview of Data-PASS Shared Catalog


Download ppt "An Overview of Data-PASS Shared Catalog"

Similar presentations


Ads by Google