Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program.

Slides:



Advertisements
Similar presentations
The Data Conservancy: A Digital Research and Curation Virtual Organization D4Science World User Meeting November 25, 2009.
Advertisements

Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
The Changing Research Data Paradigm One agency’s response Changes to Implementation of NSF’s Data Sharing Policy NOAA’s second annual Environmental Data.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
NSF and Environmental Cyberinfrastructure Margaret Leinen Environmental Cyberinfrastructure Workshop, NCAR 2002.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Office of Science Office of Biological and Environmental Research J Michael Kuperberg, Ph.D. Dan Stover, Ph.D. Terrestrial Ecosystem Science AmeriFlux.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Matthew B. Jones Jim Regetz National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara NCEAS Synthesis Institute.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Digital Library Architecture and Technology
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Data Integration, Analysis, and Synthesis Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University
U.S. Department of the Interior U.S. Geological Survey CDI Data Management Working Group December 12, 2011 Sally Holl, USGS Texas Water Science Center.
Library as Partner in Creating Curriculum for Sustainability Bonnie J. Smith University of Florida Libraries Maria A. Jankowska UCLA Research Library.
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
A Proposal for a Distributed Earth Observation Data Network Matthew B Jones UC Santa Barbara National Center for Ecological Analysis and Synthesis (NCEAS)
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Cyberinfrastructure Overview Core Cyberinfrastructure Team Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of.
Planning for Arctic GIS and Geographic Information Infrastructure Sponsored by the Arctic Research Support and Logistics Program 30 October 2003 Seattle,
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
U.S. Department of the Interior U.S. Geological Survey A vision for a global community Linda Gundersen Director Science Quality and Integrity US Geological.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
Data Citation and Data Attribution A View from the Data Center Perspective Bruce E. Wilson Group Lead, Client & Collaboration Technologies Oak Ridge National.
SAN DIEGO SUPERCOMPUTER CENTER This is a title AN NSF SPONSORED WORKSHOP HOSTED BY THE PARTNERSHIP FOR BIODIVERSITY INFORMATICS NATIONAL CENTER FOR ECOLOGICAL.
CUAHSI: A University Consortium for Hydrologic Science Richard P. Hooper, Executive Director Consortium of Universities for the Advancement of Hydrologic.
Jake F. Weltzin United States Geological Survey Taking the Pulse of our Planet The USA National Phenology Network.
Institute for Sustainable Earth and Environmental Software ISEES Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
LTER Data Management Margaret O’Brien Santa Barbara Coastal Long Term Ecological Research (LTER) Project Santa Barbara Channel Biodiversity Observation.
TechCon Food systems history… Agriculture has a 10,000 year history Farmers are estimated to be 38 to 45% of the global work force In the developing.
Block 7: Reports Back to Plenary Group on CE and CI Working Group Activities Tasks and Activities -- October 22 DataONE Kick-off Meeting October 20-22,
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
We take the argument of emergence very seriously: the elements which we have defined here are analytic resources rather than causal factors. They have.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
Geosciences Directorate Overview May 23, Directorate for Geosciences Mission Support research in atmospheric, earth and ocean sciences Address nation’s.
ORNL DAAC: Introduction Bob Cook ORNL DAAC Environmental Sciences Division Oak Ridge National Laboratory.
USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame.
ARL Workshop on New Collaborative Relationships: The Role of Academic Libraries in the Digital Data Universe September 26-27, 2006 ARL Prue.
LTER Science 2050: Challenges, Constraints and Opportunities Bill Michener Professor and DataONE Project Director University of New Mexico 12 September.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Data Conservancy and the US NSF DataNet Initiative Fourth Workshop on Data Preservation and Long-Term Analysis in HEP Sayeed Choudhury Johns Hopkins University.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
A Network for Environmental Data Exchange: Highlights and Plans Gladys Cotter BIO, USGS 27 January 2004 Washington, DC National Biological Information.
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
Award No: SES/SBE Project Title: Interoperability Strategies for Scientific Cyberinfrastructure: A Comparative Study Investigators: Geoffrey C.
The National Digital Stewardship Alliance: Stewardship, Collaboration, Inclusiveness, Exchange.
A Shared Commitment to Digital Preservation and Access.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Human Social Dynamics: Interoperability Strategies for Scientific Cyberinfrastructure: The Comparative Interoperability Project ( ) initiates a.
Joslynn Lee – Data Science Educator
DataNet Collaboration
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Bird of Feather Session
Wrap-Up – NSF Site Visit 8 February 2010
Presentation transcript:

Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program California Digital Library Coalition for Networked Information Spring 2009 Task Force Meeting

…new methods, management structures and technologies to manage the diversity, size, and complexity of… data sets and data streams (NSF 2007) Responding to the NSF solicitation Mound(s) built by termites

Scientific challenges and data needs  Global change is a complex scientific and societal challenge  Community needs good data  Good data…  build good science  make possible wise management  enable sound decisions  Good data need…  solid technical infrastructure  sound organization  community engagement

Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  A solution: DataONE

The challenges of global change Smith, Knapp, Collins. In press.

Critical areas in the Earth’s system

Human impacts on land-based ecosystems Ecosystems and Human Well-Being

Human impacts on the world’s oceans Halpern et al Science 319.

The move to cross-disciplinary forms of collaboration Unidisciplinary - researchers from a single discipline work together to address a common problem Multidisciplinary - researchers from different disciplines work independently or sequentially, each from his or her own disciplinary-specific perspective, to address a common problem Interdisciplinary - researchers from different disciplines work jointly to address a common problem and although some integration of their diverse perspectives occurs, participants remain anchored in their own fields Transdisciplinary - researchers from different disciplines work jointly to create a shared conceptual framework that integrates and moves beyond discipline-specific theories, concepts, and approaches, to address a common problem (Rosenfield, 1992)

Increasing Spatial Coverage Increasing Process Knowledge Adapted from CENR-OSTP Remote sensing Intensive science sites and experiments Extensive science sites Volunteer & education networks Building the knowledge pyramid

Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  DataONE: A solution

Data challenge 1: dispersed sources (“finding the needle in the haystack”)  Data are massively dispersed  Ecological field stations and research centers (100’s)  Natural history museums and biocollection facilities (100’s)  Agency data collections (100’s to 1000’s)  Individual scientists (1000’s to 10,000s to 100,000s)

Data challenge 2: diversity “the flood of increasingly heterogeneous data”  Data are heterogeneous  Syntax  (format)  Schema  (model)  Semantics  (meaning) Jones et al. 2007

Data challenge 3: poor practice “data entropy” Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997)

Data challenge 4: loss  Natural disaster  Facilities infrastructure failure  Storage failure  Server hardware/software failure  Application software failure  External dependencies (e.g. PKI failure)  Format obsolescence  Legal encumbrance  Human error  Malicious attack by human or automated agents  Loss of staffing competencies  Loss of institutional commitment  Loss of financial stability  Changes in user expectations and requirements Source: S. Abrams, CDL

Source: John Gantz, IDC Corporation: The Expanding Digital Universe Data challenge 4: more loss Transient information or unfilled demand for storage Information Available Storage Petabytes Worldwide

Cumulative impact: data longevity StudyResource Type Resource Half-life Rumsey (2002)Legal Citations1.4 years Harter and Kim (1996)Scholarly Article Citations 1.5 years Koehler (1999 and 2002)Random Web Pages2.0 years Spinellis (2003)Computer Science Citations 4.0 years Markwell and Brooks (2002) Biological Science Education Resources 4.6 years Nelson and Allen (2002)Digital Library Objects24.5 years Koehler, W. (2004) Information Research 9(2): 174.

Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  DataONE: a solution  Building on existing cyberinfrastructure  Creating new cyberinfrastructure  Developing new communities of practice

Key components Vision: enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by:  engaging the scientist in the data curation process  supporting the full data life cycle,  encouraging data stewardship and sharing  promoting best practices  engaging citizens  developing domain agnostic solutions

 Biological (genes to biomes)  Environmental  Atmospheric  Ecological  Hydrological  Oceanographic Data types

22 Existing biological data archives ESA’s Ecological Archive Long Term Ecological Research Network Fire Research & Management Exchange System National Biological Information Infrastructure Distributed Active Archive Center Knowledge Network for Biocomplexity

23 Examples of data holdings Data ArchiveTypes of Data Managed Metadata Standard(s) Biodiversity, taxonomic, ecologicalBDP, DwC, DC, OGIS Biogeochemical dynamics, terrestrial ecological Earth observation imagery DIF, BDP, ECHO Ecological, biodiversity, biophysical, social, genomics, and taxonomic EML Avian populations and molecular biologyDC Biological and taxonomicDC subset Biophysical, biodiversity, disturbance, and Earth observation imagery EML Biodiversity, biotic structure, function/process, biogeochemical, climate, and hydrologic EML Metadata Interoperability Across Data Holdings EML=Ecological Metadata Language BDP=Biological Data Profile DwC=Darwin Core DC=Dublin Core ECHO=EOS ClearingHOuse OGIS=OpenGIS DC subset=Dublin Core subset DIF=Directory Interchange Format

Providing one-stop shopping for data NBII Metadata Clearinghouse (31,864) Long Term Ecological Research (LTER) Network (6,897) ORNL Distributed Active Archive Center for Biogeochemical Data (810) Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) (783) Organization of Biological Field Stations (124) Inter-American Institute for Global Change Research (IAI) (79) MODIS and ASTER Products (LPDAAC) (38) National Phenology Network (USANPN) (29) 40,000 Records Simple Pilot Catalog Interface (searches entire metadata record)

25 Existing cyberinfrastructure: enabling tools

Building new global cyberinfrastructure

Member Nodes diverse institutions serve local community provide resources for managing their data New distributed framework Coordinating Nodes retain complete metadata catalog subset of all data perform basic indexing provide network-wide services ensure data availability (preservation) provide replication services Flexible, scalable, sustainable network

Building new global cyberinfrastructure

29 New partnerships  libraries & digital libraries  academic institutions  research networks  NSF- and government-funded synthesis & supercomputer centers/networks  government agencies  international organizations  data and metadata archives  professional societies  NGOs  commercial sector

Building global communities of practice… creating long-lived cyberinfrastructure enterprises  Community engagement  Involve cultural memory organizations  Include science educators  Engage citizens and new generations of students in best practices  Build on existing programs  Transparent, participatory governance  Adoption/creation of innovative business

Career Long Learning: best practice guides exemplary data management plans podcasts, web-casts workshops and seminars downloadable curricula Education and training Best Practice Guide How to Cite Your Data 6 in a series Best Practice Guide Using Metadata for e-research 5 in a series Gold Star Data Management Plan Here’s How Best Practice Guide How to Cite Your Data 6 in a series

Engaging citizens in science

Individual participation Individuals can:  contribute data and metadata  participate in Working Groups  join the software development community  partake of on-line learning opportunities  develop curricula and training materials  become a DIUG member (DataONE International Users Group) Byron Kim, Berkeley, 2003

Organizational participation  become a DataONE Member Node  receive data-life-cycle software  access to training materials  join in establishing standards  join the DataONE International Users Group  set future directions  join the software development community  contribute curricula and training materials Libraries, research networks, agencies can:

Summing up…  Complexities of global change and challenging data environment  DataONE will:  Create technical infrastructure  Develop organizational infrastructure  Engage the community  Meet the data challenges  Dispersed data: facilitating discovery  Data diversity: providing tools to manage data  Data practice: developing an informatics literate community  Data loss: developing infrastructure

We welcome your involvement! DataONE Management Team: William Michener, PI, University of New Mexico Suzie Allard – University of Tennessee Bob Cook – Oak Ridge National Laboratory DAAC Patricia Cruse, California Digital Library Mike Frame – USGS, National Biological Information Infrastructure Viv Hutchinson, USGS, Matt Jones – University of California Santa Barbara Steve Kelling – Cornell Lab of Ornithology Kathleen Smith, UNC DataONE Partners Kepler-CORE SEEK/KNB Teams