A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Pulan Yu School of Informatics Indiana University Bloomington Web service based Varuna.Net.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Systems Development Environment
Preparing Data for Quantitative Analysis
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
The System Development Life Cycle
Using JavaServer Pages Harry R. Erwin, PhD CIT304/CSE301.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
Objectives Overview Define system development and list the system development phases Identify the guidelines for system development Discuss the importance.
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Chapter 1 The Systems Development Environment Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Project Implementation for COSC 5050 Distributed Database Applications Lab1.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Database Systems: Design, Implementation, and Management Ninth Edition
Introduction to Systems Analysis and Design Trisha Cummings.
CS110/CS119 Introduction to Computing (Java)
Chapter 10.
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Objectives Overview Define the term, database, and explain how a database interacts with data and information Define the term, data integrity, and describe.
© 2003 East Collaborative e ast COLLABORATIVE ® eC SoftwareProducts TrackeCHealth.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Chapter 1 The Systems Development Environment Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
Chapter 14 Information System Development
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
GCSE Information and Communications Technology. Assessment The course is split into 60% coursework and 40% exam You will produce coursework in year 10.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Configuration Management (CM)
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
University of Illinois at Urbana-Champaign A Unified Platform for Archival Description and Access Christopher J. Prom, Christopher A. Rishel, Scott W.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
TECHONOLOGY experts INDUSTRY Some of our clients Link Translation’s extensive experience includes translation for some of the world's largest and leading.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Background Researchers and funders continue to be concerned about the lack of archiving of scientific data. Such data can be useful to researchers, educators,
Jonathan Crabtree Assistant Director of Computing and Archival Research UNC Chapel Hill, Odum Institute Vision for Sociometric Analysis of Long-tail Science.
Persistent Digital Archives and Library System (PeDALS)
PwC New Technologies New Risks. PricewaterhouseCoopers Technology and Security Evolution Mainframe Technology –Single host –Limited Trusted users Security.
CSE 102 Introduction to Computer Engineering What is Computer Engineering?
DigCCurr Winter Institute January 7-8, 2013 Chapel Hill, North Carolina, USA DigCCurr Professional Institute Rachel Trent.
Looking into the future… Providing Social Science Data Services Jim Jacobs.
Chapter 1 Introduction to Systems Design and Analysis Systems Analysis and Design Kendall and Kendall Sixth Edition.
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
KING SAUD UNIVERSITY – COLLAGE OF COMPUTER AND INFORMATION SCIENCES CSC 113 JAVA ONLINE DOCUMENTATION.
Supervised By: Dr. Juergen Dingel Suchita Ganesan, Laith “Leo” Juwaidah, Nondini Das Madiha Kazmi, Mojtaba Bagherzadeh Model-Based Monitoring for PapyrusRT.
Invitation to Computer Science 6 th Edition Chapter 10 The Tower of Babel.
Library Online Resource Analysis (LORA) System Introduction Electronic information resources and databases have become an essential part of library collections.
“Moh’d Sami” AshhabSummer 2008University of Jordan MATLAB By (Mohammed Sami) Ashhab University of Jordan Summer 2008.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
CHAPTER ELEVEN Information System Development and Programming Languages Copyright © Cengage Learning. All rights reserved.
The System Development Life Cycle
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Simple and intuitive fare conditions
Smart IT Job Advisor and Analysis on web application
An Overview of Data-PASS Shared Catalog
Building A Web-based University Archive
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
The System Development Life Cycle
Java Online documentation
Presentation transcript:

A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian, Erin Crane, & Cheryl Thompson | School of Information & Library Science | University of North Carolina at Chapel Hill The H. W. Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill collects and preserves digital social science research data and makes it publicly available online for discovery and secondary analysis via the Dataverse Network (DVN). The current data ingest workflow requires a multitude of tasks within several software programs to correct data variable label truncation, which is a result of the 255-character limit in statistical software packages. Because of this inherent limitation in the statistical software, the ingest of data into the archive is often a complex process that introduces a single point of failure in the ingest workflow that can result in data corruption. An examination of the data ingest workflow presents an opportunity to eliminate this single point of failure by introducing a newly-developed computing script that automates the process of correcting truncated data variable labels—thus preserving the complete archival record. METHOD This poster reports findings from an analysis of the DVN data ingest workflow and presents one solution for improving the efficiency and accuracy of data ingest. Several observations and interview sessions were conducted to study the various tasks and tools involved in the current workflow. Models illustrating the workflow and tools were developed to assist in the identification of points of failure and opportunities for improvement. The model below highlights deficiencies in the current data ingest workflow. RESULTS Acknowledgements | Thanks to Jonathan Crabtree, Assistant Director of Archive and Information Technology, Odum Institute; Dr. Stephanie Haas, Systems Analysis Professor; & Freeman Lo, Applications Analyst To avoid the risks and single point of failure in the data ingest workflow, a Python runtime script was developed to eliminate direct user interaction with the DVN database. Rather, the script performs background processes that locate the appropriate record, reads the TXT file containing complete data variable labels, and communicates with the DVN database to correct any truncated labels. The burden on the archivist is reduced and records in the DVN are accurate and complete. Scripting offers the power of customizing archival platforms and technologies to meet the needs of today’s digital archival collections, archivists, and the research professionals who depend on them. The increasing use of and dependency on digital research data have prompted funding agencies to issue mandates requiring researchers to develop a data management plan that includes details about data access, distribution, and archiving. Like other research universities, the University of North Carolina Provost has assembled a task force to develop recommendations on the stewardship of digital research data. As a result, much interest in the digital data archive has been generated. The Dataverse Network platform offers a solution to social science data management and preservation needs; however, the introduction of a script to address an inherent challenge confronting archivists is necessary to increase the functionality of the DVN and the usefulness of its records. 1.The archivist uploads his/her data files to the Dataverse Network (DVN) and notates the automatically-generated Universal Numerical Fingerprint (UNF). 2.The archivist initiates the Python runtime script, which prompts the archivist to enter the UNF and the file path to the TXT data variable label file. 3.The Python script communicates with the DVN PostgreSQL database engine to identify the appropriate record and overwrite truncated data variable label strings with the correct strings. 4.The script displays to the archivist the data variables that were modified for quality control and documentation. 5.The data variable labels are complete in the DVN, which enables discovery and proper analysis of the data. THE SCRIPT NEXT STEPS Convert the Python script to a Java GUI application to improve ease of use and usability Integrate the Java (JSP) application into the DVN web interface for data submissions Test the script with researchers and data producers to understand how the data ingest process could be integrated into the research life cycle Make the archival package available Edit the data file for completeness and accuracy Transform submission package into archival package Store data submission package Decide whether data file requires editing Create text file for data edits Convert submission files to archival preservation formats (e.g.,.pdfa,.por) Create catalog record and upload files into Dataverse Network (DVN) Apply edits to data file in DVN Create SQL code for data edits GOALSTRIGGER: RECEIVE DATA SUBMISSION PACKAGE Verify edits were performed Publicly release archival package in DVN Possible single point of failure if apply edits to wrong data in DVN REPLACED BY SCRIPT