OceanTeacher Global Academy Pilot Course Digital Asset Management 30 September/4 October, 2013 Kenya Marine and Fisheries Research Institute (KMFRI) OceanTeacher Regional Training Centre Mombasa, Kenya Mombasa, Kenya Data and Data Citation Linda Pikula NOAA
Is Data a digital asset? Class?
Statement on digital assets: “Unlike traditional analog objects such as books or photographs where the user has unmediated access to the content, a digital object always needs a software environment to render it”
Does that statement apply to data? Does data always need a software environment to render it?
“The data deluge is a reality in many fields. Scientific instruments are generating data at greater speed, densities and detail than before possible.” “Digital technologies are reshaping the practice of science” “Increases in computational capacity and capability drive more powerful modeling, simulation and analysis” There is a place for Data in the scholarly life cycle and a role for Librarians in this cycle. How will we define our role?
Digital Assets In the case of born-digital content (e.g., institutional archives, Web sites, electronic audio and video content, born-digital photography and art, research data sets, observational data), the enormous and growing quantity of content presents significant scaling issues to digital preservation efforts.
Data Management Data Access Data Preservation Data Rights Management
2.1 Main concepts related with data sharing, data publication, data citation and data metrics In this report the following concepts are used: “Data sharing” has been defined as the “voluntary provision of information from one individual or institution to another for purposes of legitimate research” (Fienberg et al., 1985) or simply “the release of research data for use by others” (Borgman, 2012). This general concept is grounded in the assumption that data are a valuable long-term resource and that sharing them and making them publicly-available is essential if their potential value is to be realized (Swan & Brown, 2008). Data sharing requires the systematic collection, curation and dissemination of data. “Data citations” have been defined as formal citations included in the reference list of published articles to data resources that led to a given research result (Mayernik, 2012). In this sense, the concept of data citation is tied to the idea that datasets should be published just as other kinds of scholarly products, being considered also as first class research outputs, both from social and funding policy perspectives (Lawrence, Jones, & Matthews, 2011). “Data publication”: The idea of publication of datasets mirrors the scientific publication model, although some criticisms have been also raised (Mayernik, 2012) as this model does not fully fit all the idiosyncrasies related with the sharing and publication of datasets. “Data metrics”: Data metrics are mainly related with data publication and data citation (but not exclusively, for example we could also potentially include ‘altmetrics’ on datasets here). Both data publication and data citation can be considered as signals of use of data. Use of data can generate new data, which may feed back into the collection phase (see Figure 1). Thus, for data metrics to build up, data sharing is a necessary prerequisite. Whether it will work the other way round (metrics leading to sharing) remains to be seen. In the rest of this report data sharing (i.e. collection, curation, dissemination) and data metrics (metrics on production and use) will be dealt with separately
Scholarly Information Cycle
The Scientific Communication Life- Cycle *Björk, B-C (2007): “A model of scientific communication as a global distributed information system”, Information Research, 12(2) paper 307
Scholarly Communication Cycle -Open Access - L. Lyon
Why Link Data? Class?
Scholarly Info Cycle for Data, Value Chain MethodLibrary Role Legitimization of Data Trust in DataPeer Review Registration of DataMetadata, preservation, curation Certification of Data Dissemination of DataAccess, preservation
Four categories of data Observational Computational Experimental Records
Incentives to share data Scholars concernsAgreements among research partners COLLABORATION RECOGNITION Reciprocity Coercion Open Science Publishers concernsEconomic Preservation, access, documentation Librarians concernsDocumentation, provenance, access, preservation Performance evaluations, bibliometrics “peer citing”
Incentives NOT to share data Rewarded for publication not data management Difficult and time consuming to document data for another's use subsequent to own use Competition for grant funding and recognition amongst scientists Keep control of intellectual propertyPublishers, scientists
Examples of efforts to include data along with scholarly journal articles SCOR/IODE/WHOI/MBL project AMS- American Meteorological Society publications Data Cite and Crosstalk registries NOAA catalog Elsevier - Pangea
Examples continued American Geophysical Union (AGU) and European Geophysical Union (EGU)publications Online Repositories_ WHOAS Woods Hole
Key issues for Data and e-science Issues for Librarians Discovery and Identification : What data exist? Where are the data and how can they be accessed? Access : Who has access? How will the privacy of both users and research subjects be protected? What kinds of rights management structures need to be established, if any? Interoperability : In what formats will data be stored and presented? What kinds of metadata will be applied? How will variables be described? What data models apply? Retention Criteria : Is the data likely to be reused? Will another researcher be able to reasonably replicate or build upon the original results using this data? What is the cost of metadata creation, and how does that compare to the expected value of the data to other researchers?
Definition of “Linked Data” In computing, linked data (often capitalized as Linked Data) describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as Http, RDF and URLS, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried. [1] [1] Tim Berners Lee, director of the World Wide Web Consortium, coined the term in a design note discussing issues around the Semantic Web project. [2] [2]
Persistent Identifiers
How to Link Data
Librarian’s Role in Data Management Roles: 1.Data management including collection, organization, description, curation, archiving, and dissemination-creating a plan 2. Creation of new data- and scholarship-based electronic resources for university and/or public use 3. Development of new models, standards, and architectures for various aspects of data management, description, etc. 4. Building accessible linkages between all the components and stages of research, from data to researchers to publications 5. Bridging institutional hierarchies and departmental divisions in service of interdisciplinary initiatives
Name of the course [date]x – x month, 201x [host organisation] IODE Project Office [place: city, country] Oostende, Belgium Data Citation Name of the trainer Trainer’s affiliation address Space for Trainer’s organisation logo, in case he/she wants/needs
Citation and Peer Review of Data Citation Metrics? Class Discussion Thomson Reuters Web of Science and Data Citation Databases
Digital Object Identifier (DOI) A digital object identifier (DOI) is a character string used to uniquely identify an object. Metadata about the object is stored in association with the DOI name. Libraries have been using for years, now de facto standard for data DOI registry DOI registry agent – CrossRef publisher – MBLWHOI 5105 – “item” number
E-Repositories and Data How many of you have an E-Repository? OceanDocs? Other? Software used? For Your Information: DSpace Repository Accepts both text documents and datasets Accepts data related to articles as well as data not associated with a paper
Current Status of Linked Data How many of you currently access/link to data through your repositories? …through your online library catalog? …through your Data Divisions web pages?
Use the following slide on Lat/Long To discuss Bibliographies that have data links to geospatial data The slide shows metadata needed
Lat / Long
List of Most Common Metadata Fields
How to ‘mint’ a DOI Mint? Registries for DOI’s: CrossRef DATACite ESIP Parts of the DOI Explained
10 -DOI registry DOI registry agent – CrossRef publisher – MBLWHOI 5105 – “item” number
NOAA Examples Cruise Videos Here? Historic International Climate Data (NOAA,IODE) NOAA Pilot Project DOI’s: Landing pages: tion/iso/xml/Hazard_Images_Database.xml&view=iso2html tion/iso/xml/Hazard_Images_Database.xml&view=iso2html
NOAA DOI Pilot Project Wiki-How to Assign a DOI
TYPES OF DATA Oceanography Fisheries Atmospheric Sciences
Carbon Cycles Ocean Temperature, Color,Depth,Salinity Time Series Mixed layer surface currents Meridional heat transport Global Heat Storage Global Surface Currents Essential Climate Variables data(salinity chlorophyll, altimetry, surface wind and current) Wave Data Coastal Climatologies Data for use in Marine Spatial Planning and Decision support applications for climate, ecosystems and coastal planning Oceanographic Data
Fisheries Catch, abundance, sex, size Commercial Fisheries landings/exploitation recreational fisheries Stock assessments/abundance, species, habitat assessments, surveys at sea, recruitment Environmental- habitat, water quality, climate cycles International Organizations that collect or maintain Fisheries Statistics anizations.html Fisheries Data
Air pressure and winds Near surface winds (ocean surface) Hurricane and storm data Other Atmospheric -Air/Sea Interaction
Other Examples Historic Oceanographic Cruises Marine Photo Libraries (Images) Geospatial Data- what is? Bibliographies online with Geospatial Data
Exercise 1 Create a DOI and Metadata for the following Publication: 2005 Carbon dioxide, hydrographic, and chemical data obtained during the R/V Maurice Ewing cruise in the Atlantic Ocean : (WOCE section A17, 4 January-21 March 1994) Online version in PDF format
Exercise 2 Create a Metadata record and DOI for the following digital asset 2013 Seafloor video footage and still-frame grabs from U.S. Geological Survey cruises in Hawaiian nearshore waters Online document in PDF and MOV (PURL) Find a NOAA Video cataloged and have students do the above exercise
Exercise 3 On the web, find a publication in a marine science journal which shows links to data Copy the information on a slide, for class discussion
Data Citation Google search: esip data citation for more examples The core required elements of a citation are Author(s)--the people or organizations responsible for the intellectual work to develop the data set. The data creators. Release Date--when the particular version of the data set was first made available for use (and potential citation) by others. Title--the formal title of the data set Version--the precise version of the data used. Careful version tracking is critical to accurate citation. Archive and/or Distributor--the organization distributing or caring for the data, ideally over the long term. Locator/Identifier--this could be a URL but ideally it should be a persistent service, such as a DOI, Handle or ARK, that resolves to the current location of the data in question. Access Date and Time--because data can be dynamic and changeable in ways that are not always reflected in release dates and versions, it is important to indicate when on-line data were accessed. Additional fields can be added as necessary to credit other people and institutions, etc. Additionally, it is important to provide a scheme for users to indicate the precise subset of data that were used. This could be the temporal and spatial range of the data, the types of files used, a specific query id, or other ways of describing how the data were subsetted. An example citation: Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated CLPX-Ground: ISA snow depth transects and related measurements ver Edited by M. Parsons and M. J. Brodzik. National Snow and Ice Data Center. Data set accessed at
DataCite Metadata Version 3.0 now available July New features include: -Better support for recording data location -Discipline specific meta data fields to supplement the generic schema -Better documentation as a whole for DataCite
Cruises and Expeditions-Data Management from Library Perspective Example: NOAA Video Data Management
Video Data Management System (VDMS) Archives, Preserves and Provides Online Access to NOAA Digital Video and Image Data Anna Fiolek, Metadata Librarian National Oceanographic Data Center, NOAA Central Library Silver Spring, Maryland Project’s All images from: NOAA Photo Library at: NOAA
OER IPT/VDMS VDMS Objectives Provide timely online information about NOAA’s Office of Exploration and Research video data to the general public. Educate our Nation about NOAA oceanographic expeditions and underwater explorations through digital video and related information. Archive and preserve unique video and related data for future generations. Collaborate with NOAA librarians, data managers, and scientists from different NOAA offices and programs. Use or extend existing library tools, guidelines, and metadata standards to support new media formats: digital video, digital image, and digital text documents. Enhance data access and metadata sharing between NCL NOAALINC, NODC Ocean Archive System (OAS), and NCDDC MerMaid catalogs, Digital Atlas, CoRIS, and ASFA.
Example of Data about a research Cruise Available through the online catalog NOAALinc Signature Explorations per year Summary Explorations per year. VDMS archives and provide either online or off-line access to over 3000 OER digital video tapes or DVD discs, and over 300 video highlights and video clips. *Sustainable Seas Expeditions (SSE) , 18 missions (13 of them to NMS areas)
Access through a Web Landing Page
Access through Library Catalog NOAA Library Catalog: 2009 Bermuda Caves 2009 (Collection) 09bermuda/
CoRIS metadata searches include Library metadata via Z39.50 protocol NOAA Photo Library
Two Methods of Digital Research Data Management shown Access through online Catalog and specialized topic catalog Access through Internet web page (Landing Page) to data archive
Cruise Data Received and Distributed by OE/NCDDC Original/Copy Digital Video Tapes for Long- Term Archiving and Preservation NOAA Central Library Catalog Online NODC Ocean Archive System Video Annotations Image Annotations Cruise Reports Quick Look ReportsSituational Reports Peer-Review Publications Educational Lesson Plans K-12 NGDC Archive System Original Raw CTD DATA for Archiving, Preservation, and Online Access Digital Video Highlights Digital Image Highlights Original Raw Multi-Beam Data for Preservation and Online Access Uncompressed Digital Video Data for Online Archiving and Preservation Digital Video HighlightsVideo Annotations Cruise Reports Quick Look ReportsSituational Reports Peer-Review Publications Digital Images Image Annotations Digital Image Highlights Web Sites and Related Home PagesEducational Lesson Plans Video Supporting Documents For Online Access Video Supporting Documents For Archiving, Preservation and Central Online Access OER DATA FLOW shows collaboration between NOAA Data Centers Digital Atlas MERMAid Catalog
Another Example of Data Management- Access
NOAA AOML This is the home page of the NOAA AOML Laboratory – showing types of data available at the laboratory and focus of research
Software tools for data discovery, access, visualization and analysis Data Discovery and Access: Web Based User Interfaces Programmatic access interfaces Links to the literature EXAMPLES: GeoMapApp Virtual Ocean EarthChem Portals to complementary data in other repositories: ASP, EarthChem, USAP-DCC, GoogleEarth Publication E-Repositories
Other Common Digital Assets in Library? Photo collections online Weather records online Cruise reports- digitized? Other possible digital assets? Historical documents? Videos of cruises?
Not Data-But Example of Marine Digital Photo Library Management
Sample Data Sources NOAA Coastal Water Temps World Sea Temperatures BODC CTD and Underway Data Global Ocean Data ocean/Global-and-regional-views/Global-Oceanhttp:// ocean/Global-and-regional-views/Global-Ocean Real Time Arctic Data BCO-DMO
Other Online Resources MANTRA: online course on Research Data Management SUNScholar/Digital Preservation Data Curation Profiles Symposium mms://video1.itap.purdue.edu/DCPSymposium Data Curation Profiles Toolkit Data Curation Symposium
Mantra
Sunscholar
Standards ISQ (NISO)Information Standards Quarterly Spring/Summer 2012 v.24 issue 2/3 : Linked Data for Libraries, Archives, and Museums
Credits Lisa Raymond –E-Repositories Presentation L.Pikula – OT Digital Asset Management, 2009 Ball, A. and Duke, M. (2012) “How to Cite Datasets and Link to Publications.” DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: Fiolek, Anna. Video Data Management System VDMS, NOAA. Cycle of Scholarly Information, Washington and Lee University Libraries NOAA Central Library Website
Credits Lyons, Scholarly information cycle