The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert www.iedadata.org.

Slides:



Advertisements
Similar presentations
Criteria for the trustworthiness of data centres Jens Klump Helmholtz Centre Potsdam German Research Centre for Geosciences (GFZ) DataCite Summer Meeting.
Advertisements

The GEOSS Science and Technology Stakeholder Network: A spin-off of the EGIDA Project Hans-Peter Plag, Ian McCallum, Steffen Fritz, Shelley Jules-Plag,
Indian Institute of Remote Sensing Indian Space Research Organisation Dehradun Challenges in Capacity Building in Remote Sensing & GIS P. S. Roy
Dr. Markus Quandt GESIS – Leibniz-Institute for the Social Sciences Workshop: Persistent Identifiers for the Social Sciences University Club, Bonn, February.
Earth System Curator Spanning the Gap Between Models and Datasets.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements: A Data Scientist Perspective Dr. Vicki Lynn Ferrini.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
IDENTIFIERS & THE DATA CITATION INDEX DISCOVERY, ACCESS, AND CITATION OF PUBLISHED RESEARCH DATA NIGEL ROBINSON 17 OCTOBER 2013.
ODM2: Developing a Community Information Model and Supporting Software to Extend Interoperability of Sensor and Sample Based Earth Observations Jeffery.
Metadata Standards for Sample- Based Observations Kerstin Lehnert EGU General Assembly 2011.
Future Access to the Scientific and Cultural Heritage – A shared Responsibility Birte Christensen-Dalsgaard State and University Library.
DATA SYSTEMS FOR SAMPLE- BASED OBSERVATIONS 1 Kerstin Lehnert.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
New Generation SDI and Cyber-Infrastructure Prof. Guoqing Li CEODE/CAS March 29, 2009, Newport Beach, USA Presented to 4th China-US Roundtable Meeting.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Transforming Data-Driven Publications and Decision Support Joan L. Aron, Ph.D. Consultant Federal Big Data Working Group COM.BigData 2014.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Data Resources US Perspective Kerstin Lehnert Suzanne Carbotte Lamont-Doherty Earth Observatory of Columbia University.
Information Requirements for Integrating Spatially Discrete, Feature- Based Earth Observations Jeffery S. Horsburgh Anthony Aufdenkampe, Kerstin Lehnert,
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
CrossRef, DOIs and Data: A Perfect Combination Ed Pentz, Executive Director, CrossRef CODATA ’06 Session K4 October 25, 2006.
Open for ^ Business Research Data Services & Data Management Planning Ryan Schryver Wendt Commons is our.
World Data Center for Human Interactions in the Environment Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
METADATA QUALITY IN EUROPEANA , Den Haag.
GEO: a special collection for Earth Science community *Stefania Biagioni, *Silvia Giannini, **Cecilia Giussani *CNR-ISTI, **CNR-IGG Pisa, Italy GL13 Conference,
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
Every datum counts! Capitalising on small contributions to the big dreams of mobilising biodiversity information Vishwas Chavan, Eamonn O’ Tuama, Samy.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
How would you give guidance or prioritize how to address gaps in the lifecycle of data acquisition, curation and preservation? Are there new programs or.
Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University.
Data Practices across Disciplines: Informing Collections & Curation Carole L. Palmer Melissa H. Cragin, Tiffany Chao, & Nic Weber Center for Informatics.
Two types of data requirements: 1. "time-series monitoring data" which requires real- time continuous collection (sea surface temperature, stream gauges,
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
EarthChem Solid Earth Geochemistry in Geoinformatics
The IBM and CentAm subduction areas are linked by plate tectonics, in between lies the carbonate- rich equatorial Pacific--one of the two most important.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Arizona Astronomical Data Hub AAS 227: Dark/Orphaned Data P. Bryan Heidorn ORCID: University of January 2016.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
1 Why is Digital Curation Important for Workforce and Economic Development? Alan Blatecky Office of Cyberinfrastructure Symposium on Digital Curation in.
A Reference Model for RDA & Global Data Science Yin ChenWouter Los Cardiff University University of Amsterdam 1.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
E ARTHCUBE C ONCEPTUAL D ESIGN A Scalable Community Driven Architecture Overview PI:
Connecting Users, Data & Data Repositories Simon J. Goring ORCID: John W. Williams doi: /m9.figshare Distinguished Lecture.
ISWG / SIF / GEOSS OOSSIW - November, 2008 GEOSS “Interoperability” Steven F. Browdy (ISWG, SIF, SCC)
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
NOAA EDMC Ocean Observatories Initiative Cyberinfrastructure Karen Stocks OOI CI Data Curator University of California, San Diego Ocean Observatories.
ISWG / SIF / GEOSS OOS - August, 2008 GEOSS Interoperability Steven F. Browdy (ISWG, SIF, SCC)
Transformative Earth Sciences through Data: Neotoma, EarthCube & Flyover Country Simon Goring Assistant Scientist University of Wisconsin - Madison S i.
Digital Asset Management: E-Science Life-Cycle Anthony D. Smith Ocean Teacher Academy Training Course, 30 September - 4 October 2013, Mombasa, Kenya.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
IG Physical Samples and Collections in the Research Data Ecosystem
Bird of Feather Session
Presentation transcript:

The Long Tail of Sample-based Data in the Next Decade FROM DARKNESS TO LIGHT Kerstin Lehnert

10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 2 “Dark Data is information and results from research that has not been properly archived, and therefore is not known to exist and cannot be utilized.” From: Digital Curation – the Class Blog

CHRIS ANDERSON’S LONG TAIL 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 3

BRYAN HEIDORN’S LONG TAIL 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 4 Heidorn, P. Bryan (2008). Shedding Light on the Dark Data in the Long Tail of Science. Library Trends 57(2) Fall 2008.

SAMPLE-BASED DATA 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 5 observations made on a sample mostly ex-situ observations (lab data) information about the sample the physical object “Observations commonly involve sampling of an ultimate feature of interest.” (OGC O&M / ISO19156; editor: Simon Cox)

heterogeneous hand generated unique procedures individual curation not maintained seldom reused currently unnoticed homogeneous mechanized uniform procedures central curation maintained immediately reused make careers BIG DATA VS SMALL DATA Big Data (Head)Small Data (Tail) 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 6

WHY DO SMALL DATA STAY IN THE DARKNESS? 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 7 Lack of infrastructure No adequate repositories exist. Lack of tools & support for data curation. Lack of reward structure/incentives Large effort to organize and document the data. No professional recognition for data sharing. Publications often contain only abstract representations of the data. Traditional scientific articles are the only way to provide access. Researchers ‘hold’ the data for later mining.

SAMPLE-BASED (SMALL) DATA ISSUES 8 Highly diverse (thousands of variables and materials) Diverse & customized data acquisition procedures Complex data documentation Lack of data formats Data often not digital: field notes, visual sample descriptions Lack of data repositories Culture of non-sharing 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA

WHY SAMPLE-BASED DATA MATTER 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 9 data on samples are key to our knowledge of Earth’s dynamical systems and evolution global climate change and paleoclimate biogeochemical cycles magmatic processes, mantle dynamics samples are a relevant component of earth observations calibration of models and simulations of earth systems samples and sample-based data are often expensive to acquire

FOCI FOR THE NEXT DECADE 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 10 infrastructure repositories, standards, workforce incentives attribution, recognition, cool tools support resources, training

GEOINFORMATICS FOR GEOCHEMISTRY 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 11 developed data models and databases for sample-based analytical data built highly successful geochemical synthesis databases (PetDB, EarthChem) developed standards for data reporting created the International Geo Sample Number as a unique identifier for samples since October 2010 part of the NSF-funded IEDA Data Facility

REPOSITORY SERVICE G EOCHEMICAL R ESOURCE L IBRARY Repository for sample-based data Web-based user submission 1210/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA

GRL: NEW CAPABILITIES IN Linking datasets to NSF award numbers IEDA Data Compliance Report lists datasets in the GRL & MGDS Interoperability with FastLane Extended metadata for discovery Include sample identifiers & locations for samples in dataset metadata Long-term preservation of data (CU Libraries) Dataset registration with DOIs (DataCite)

GFG DATA SUBMISSION 14 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA

10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 15 DOI: /IEDA/ Metadata record in the Geochemical Resource Library

16

SAMPLE REGISTRATION AT SESAR 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 17 Facilitate discovery of samples Ensure unique identification Preserve sample metadata

10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 18

10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 19

LIGHT ON THE HORIZON 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 20 Growing recognition globally of the need for access to scientific data NSF’s new implementation of their data sharing policy Funding to develop GEO data infrastructure DataNet EarthCube Slide courtesy of B. Ransom, NSF/OCE

LIGHT ON THE HORIZON 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 21 New services & tools emerging that facilitate curation of sample- based data SESAR sample registration data publication tools for data & metadata capture

MUCH MORE IS NEEDED 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 22 recognition of data citation as a professional achievement a new workforce resources for data curation data management as part of the Geoscience curriculum community governance

Dark data is important, and we will not know how important it may be until more and more of it is made available to us. 10/9/2011 GSA 2011: FROM DARKNESS TO LIGHT: LONG TAIL OF SAMPLE-BASED DATA 23