Open Dialogue on Digital Data management

Slides:



Advertisements
Similar presentations
Moving Forward With Digital Preservation at the Library of Congress Laura Campbell Associate Librarian for Strategic Initiatives Library of Congress.
Advertisements

Pulling it all together… with thanks to Sheila Anderson.
Big Data Forum April 18, 2013 Beth Oehlerts Digital Management Librarian Nancy Hunter Coordinator of Acquisitions and Metadata Services.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
The RUresearch Data Portal: Providing Customized Access for Specific Types of Data and Primary Users Mary Beth Weber Head, Central Technical Services Rutgers.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Research CU Boulder Cyberinfrastructure & Data management Thomas Hauser Director Research Computing CU-Boulder
IT Task Force Report Item 2.a Significantly expand the Libraries’ emphasis on digital collections, including realigning staffing to emphasize digital areas,
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
When can I begin submitting works to the Digital Repository ?
SUNYConnect Strategic Directions The next five years Carey Hatch SUNY Office of Library and Information Services.
1 Focus on the User User Centered Design for Finding Articles David Lindahl Director of Digital Library Initiatives University of Rochester Libraries
The KnowledgeBank: Powered by DSpace Laura Tull Systems Librarian Ohio State University Libraries WiLSWorld July 27, 2004.
NHPRC ELECTRONIC RECORDS RESEARCH FELLOWSHIP SYMPOSIUM Nov. 19, 2004 Rebecca Schulte University of Kansas Project Title: Testing Boundaries—An Exploration.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Addressing Information Security at Heller October 16, 2013 secureHeller.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
Digital Curation in Architecture Curricula and vocational training for Architects The DEDICATE Framework in Architectural CAD Courses Design by Dr Ian.
Field Project Planning, Operations and Data Services Jim Moore, EOL Field Project Services (FPS) Mike Daniels, EOL Computing, Data and Software (CDS) Facility.
Griffith University Malcolm Wolski Director, eResearch and Scholarly Application Development Division of Information Services
EPSRC expectations on research data: What researchers need to know 12/03/2015 Masud Khokhar and Hardy Schwamm.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Electronic Thesis and Dissertation Initiative at Indiana State University(ISU) where to start and where to go Valentine Muyumba (Chair of Cataloging and.
Texas Digital Library CENTRAL TEXAS AND SAN ANTONIO-AREA REGIONAL MEETING SEPTEMBER 5, 2013.
Research Data Management Victoria University Context Lyle Winton Adrian Gallagher Julie Gardner.
ISTeC Data Management Forum #3 Pat Burns Dean of CSU Libraries & VP for IT Friday, May 2, /02/2014 ISTeC DM Forum31.
1 Data services and computing. 2 We tend to be dealt the computing environment in which we must operate. Few of us have enough influence to steer the.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
An Environmental Scan for Data Services Trends that are shaping today’s environment for data services.
A survey based analysis on training opportunities Dr. Jūratė Kuprienė Framing the digital curation curriculum International Conference Florence, Italy.
Looking Ahead: A New PSU Research Cloud Architecture Chuck Gilbert - Systems Architect and Systems Team Lead Research CI Coordinating Committee Meeting.
How OAIS and OA IR you? Developing workflows in publishing, promoting, and preserving faculty grey literature within a university Plato L. Smith II; Digital.
University Libraries/ITS Content Stewardship Program Mairéad Martin, Sr. Director, ITS Digital Library Technologies Presentation to FACAC March 1, 2011.
Services for Object Storage and Preservation March 2008 All content in these slides is considered work in progress. In no way does it represent an absolute.
Research Data Services from the ASU Libraries Mary Whelan GIS Data Manager.
Institute Repositories and Digital Preservation : Assessing Current Practices at Research Library Rathachai Chawuthai Information.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Digital Repositories: Concepts and Issues By Devendra. S. Gobbur (Sr) Assistant Librarian, Gulbarga University, Gulbarga. 10 NOV, NOV, 2009.
Institutional Repositories July 2007 Intellectual property management : the DISA experience Dr D Peters DISA: Digital Innovation South Africa.
15.05 – From Strategy to Solutions: discovering and accessing monographs. Neil Grindley is responsible for areas of work at JISC that address how.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Digital Library Storage Strategies Robert Cartolano, Director Library Information Technology Office November 14, 2008.
Open Access & Institutional Repositories, Accra June 2007 Metadata and e-preservation Dr D Peters DISA: Digital Innovation South Africa.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Research Data Lifecycle Management Workshop Report Curt Hillegas 9/8/2011.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
GEO Data Management Principles Implementation : World Data System–Data Seal of Approval (WDS-DSA) Core Certification of Digital Repositories Dr Mustapha.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
Writing a successful data management plan Kathleen Fear October 17, 2013.
CMU Libraries’ Digital Assets Preservation Strategy Presenter Gabrielle V. Michalek Principal Archivist and Head, Archives/Digital Library Initiatives.
The New Now: Institutional Repositories and Academia Institutional Repository USM April 17, 2015 Marilyn Billings Scholarly Communication Librarian.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Todd Quinn – Business & Economics Librarian
Legacy and future of the World Data System (WDS) certification of data services and networks Dr Mustapha Mokrane, Executive Director, WDS International.
Summit 2017 Breakout Group 2: Data Management (DM)
Successful Data Curation for Large Data Archives
Presentation transcript:

Open Dialogue on Digital Data management Pat Burns, Dean Dawn Paschal, Assistant Dean CSU Libraries Open Dialogue on Digital Data management October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Background NSF requires proposals submitted as of Jan. 18, 2011 to include plans for data management: http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#IIC2j NIH & USDA also have similar requirements Other agencies looming: ‘Federal Research Public Access Act’ Maximizing the value of data by sharing Discoverability Access Preservation Management October 13, 2010 Open Dialogue, Data Mgmt.

Science ‘Then’ (5-10 years ago) Theory Computation Experiment October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Science ‘Now’ Data Theory Computation Experiment Data Data Data Data Data Data Data Data Data Data October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Science ‘Emerging’ Theory Experiment Data Computation October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Science 2.0 ‘Now’? Data Theory Experiment Data Computation October 13, 2010 Open Dialogue, Data Mgmt.

Large Digital Data Sets Satellite imagery can generate > 1 petabyte (1015 bytes) of data per day! Supercomputers also generate massive data sets Can we transport them? E.g., at 10 Gbits per second (note bits, not bytes: 1 byte = 8 bits) Time = 8x1015 bits/(1010 bits/sec) = 8x105 secs = 222 hours = 1 week, 2 days, 6 hours, 13 mins Can we store them? Requires 500 ea. 2 TByte disks @ $250 ea. = $12,500; @ 5 year lifetime = $2,500/yr. Requires 1 full rack in a data center: space, power, cooling, … October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Incoming! An individual researcher can generate many data sets We have many researchers who generate large data sets Number: Many x many = Very many! Size: Very many x Very big = Enormous! October 13, 2010 Open Dialogue, Data Mgmt.

Projected Needs (2009 CSU Survey) CSU-DR = 3 Tbytes!!! October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. How Can We Help? IT Libraries Storage capacity Transport capacity Back-up Sysadmin IT security/privacy Transcoding Data organization & structure IP issues Metadata Discoverability Preservation Joint Operations “there comes a time in one’s life, where one must grab the bull by the tail, and face the situation. System Stewards Data/Info Stewards The ‘front end’ Interactions w/ researchers The ‘back end’ October 13, 2010 Open Dialogue, Data Mgmt.

How Can We Help (cont’d)? Agreement upon a framework Draft of a framework, present to faculty Language for our faculty to include in their proposals Strategy, policy, procedures Definition of work flow(s) Architectures for operations & preservation Back-up vs. preservation, LOCKSS? October 13, 2010 Open Dialogue, Data Mgmt.

Policies: The ‘Front End’ DRM: IP/ownership issues: data sets not ‘copyrightable’ (not creative works) But there may be local, institutional IP policies that override this Note that IP ≠ copyright Creative Commons or Science Commons licensing may apply An embargo period is required What are the preservation periods? October 13, 2010 Open Dialogue, Data Mgmt.

NSB Data Type Definitions* Research collections (small, useful to individuals/teams for life of a project, limited curation, standards typically lacking) Resource collections (medium, useful to a community, follow group’s standards, mid- to long-term utility) Reference collections (large, serve many segments of science/engineering, conform to robust standards, indefinite support) *National Science Board October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Workflow Faculty provide data/information Enter metadata, user’s manuals, select embargo period, select licensing options, enter pubs, point to or supply data sets, … Librarians manage data/information Review metadata, ingest and make accessible, review periodically, deaccession periodically (annually?), manage data, interact w/ faculty IT staff implement and operate systems Operate system, backups , security, upgrading storage, transport, move to LOCKSS, etc. October 13, 2010 Open Dialogue, Data Mgmt.

Digital Assets - the 4 Pieces The Metadata, ideally on the CSU-DR Typical, what we collect today, e.g. lightweight metadata (probably not copyrightable) Contextual, e.g., user’s manuals (yes, copyrightable) Scholarly publications associated with the data – ideally on the CSU-DR The data itself – should be in the most appropriate place (pointers?) October 13, 2010 Open Dialogue, Data Mgmt.

Digital Assets Management 4. Data Sets Small Medium Large 2. User’s Manuals 1. Metadata 3. Pubs Disciplinary Repositories, SC Centers, etc. Local Storage Libraries-DR “The Cloud” “Pointers” October 13, 2010 Open Dialogue, Data Mgmt.

Architecture LOCKSS High-speed Networks Primary System The Digital Repository Preservation System October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. 8/19/2009 CSU Storage Project 45 TBytes (raw) for ~$8k October 13, 2010 Open Dialogue, Data Mgmt.

Strategy for Storage of Data Sets Small, < 100 GB, we would agree to store on the DR, but not forever Medium, we would agree to store on the DR for a limited time at a cost, or on a local server somewhere and we point to it Large, stored on a disciplinary DR somewhere, at a supercomputer center, or at a large instrument center We point to it (persistent URL?) How do we deal with exceptions? October 13, 2010 Open Dialogue, Data Mgmt.

What CSUL will Store & at What Cost PRESERVATION PERIOD SIZE (+ means beyond end of grant period) SMALL 0.1 TB MEDIUM 0.1-10 TB LARGE > 10 TB Short (1 yr.+) Free Maybe + Medium (2 yrs. +) $500/TB Maybe - Long (> 5 yrs. +) $1,000/TB No Forever is a long time….. October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Needs Libraries-IT partnership Define policies for usage Define practice for usage Definition of workflows Operations Develop needed tools Build an on-line, self-service submission tool + requirements for review of user-created metadata Establish systems Develop preservation infrastructure October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Issues Will the DR become a ‘Trusted Digital Repository?’ Will this enhance our proposals? What will be stored where? Will disciplinary digital repositories emerge, e.g. at NCAR and elsewhere? Flexibility is key How best to engage The VPR (probably already accomplished) The faculty Library staff: faculty and operational (DM Librarians at UNM?) October 13, 2010 Open Dialogue, Data Mgmt.

Open Dialogue, Data Mgmt. Discussion Is most welcome. October 13, 2010 Open Dialogue, Data Mgmt.