Presentation is loading. Please wait.

Presentation is loading. Please wait.

Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program.

Similar presentations


Presentation on theme: "Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program."— Presentation transcript:

1 Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program California Digital Library Coalition for Networked Information Spring 2009 Task Force Meeting

2 …new methods, management structures and technologies to manage the diversity, size, and complexity of… data sets and data streams (NSF 2007) Responding to the NSF solicitation Mound(s) built by termites

3 Scientific challenges and data needs  Global change is a complex scientific and societal challenge  Community needs good data  Good data…  build good science  make possible wise management  enable sound decisions  Good data need…  solid technical infrastructure  sound organization  community engagement

4 Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  A solution: DataONE

5

6 The challenges of global change Smith, Knapp, Collins. In press.

7 Critical areas in the Earth’s system

8 Human impacts on land-based ecosystems Ecosystems and Human Well-Being

9 Human impacts on the world’s oceans Halpern et al. 2008 Science 319.

10 The move to cross-disciplinary forms of collaboration Unidisciplinary - researchers from a single discipline work together to address a common problem Multidisciplinary - researchers from different disciplines work independently or sequentially, each from his or her own disciplinary-specific perspective, to address a common problem Interdisciplinary - researchers from different disciplines work jointly to address a common problem and although some integration of their diverse perspectives occurs, participants remain anchored in their own fields Transdisciplinary - researchers from different disciplines work jointly to create a shared conceptual framework that integrates and moves beyond discipline-specific theories, concepts, and approaches, to address a common problem (Rosenfield, 1992)

11 Increasing Spatial Coverage Increasing Process Knowledge Adapted from CENR-OSTP Remote sensing Intensive science sites and experiments Extensive science sites Volunteer & education networks Building the knowledge pyramid

12 Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  DataONE: A solution

13 Data challenge 1: dispersed sources (“finding the needle in the haystack”)  Data are massively dispersed  Ecological field stations and research centers (100’s)  Natural history museums and biocollection facilities (100’s)  Agency data collections (100’s to 1000’s)  Individual scientists (1000’s to 10,000s to 100,000s)

14 Data challenge 2: diversity “the flood of increasingly heterogeneous data”  Data are heterogeneous  Syntax  (format)  Schema  (model)  Semantics  (meaning) Jones et al. 2007

15 Data challenge 3: poor practice “data entropy” Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997)

16 Data challenge 4: loss  Natural disaster  Facilities infrastructure failure  Storage failure  Server hardware/software failure  Application software failure  External dependencies (e.g. PKI failure)  Format obsolescence  Legal encumbrance  Human error  Malicious attack by human or automated agents  Loss of staffing competencies  Loss of institutional commitment  Loss of financial stability  Changes in user expectations and requirements Source: S. Abrams, CDL

17 Source: John Gantz, IDC Corporation: The Expanding Digital Universe Data challenge 4: more loss Transient information or unfilled demand for storage Information Available Storage Petabytes Worldwide

18 Cumulative impact: data longevity StudyResource Type Resource Half-life Rumsey (2002)Legal Citations1.4 years Harter and Kim (1996)Scholarly Article Citations 1.5 years Koehler (1999 and 2002)Random Web Pages2.0 years Spinellis (2003)Computer Science Citations 4.0 years Markwell and Brooks (2002) Biological Science Education Resources 4.6 years Nelson and Allen (2002)Digital Library Objects24.5 years Koehler, W. (2004) Information Research 9(2): 174.

19 Outline to today’s talk  Complexities of global change  Challenges for cyberinfrastructure and data intensive research  DataONE: a solution  Building on existing cyberinfrastructure  Creating new cyberinfrastructure  Developing new communities of practice

20 Key components Vision: enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by:  engaging the scientist in the data curation process  supporting the full data life cycle,  encouraging data stewardship and sharing  promoting best practices  engaging citizens  developing domain agnostic solutions

21  Biological (genes to biomes)  Environmental  Atmospheric  Ecological  Hydrological  Oceanographic Data types

22 22 Existing biological data archives ESA’s Ecological Archive Long Term Ecological Research Network Fire Research & Management Exchange System National Biological Information Infrastructure Distributed Active Archive Center Knowledge Network for Biocomplexity

23 23 Examples of data holdings Data ArchiveTypes of Data Managed Metadata Standard(s) Biodiversity, taxonomic, ecologicalBDP, DwC, DC, OGIS Biogeochemical dynamics, terrestrial ecological Earth observation imagery DIF, BDP, ECHO Ecological, biodiversity, biophysical, social, genomics, and taxonomic EML Avian populations and molecular biologyDC Biological and taxonomicDC subset Biophysical, biodiversity, disturbance, and Earth observation imagery EML Biodiversity, biotic structure, function/process, biogeochemical, climate, and hydrologic EML Metadata Interoperability Across Data Holdings EML=Ecological Metadata Language BDP=Biological Data Profile DwC=Darwin Core DC=Dublin Core ECHO=EOS ClearingHOuse OGIS=OpenGIS DC subset=Dublin Core subset DIF=Directory Interchange Format

24 Providing one-stop shopping for data NBII Metadata Clearinghouse (31,864) Long Term Ecological Research (LTER) Network (6,897) ORNL Distributed Active Archive Center for Biogeochemical Data (810) Large Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) (783) Organization of Biological Field Stations (124) Inter-American Institute for Global Change Research (IAI) (79) MODIS and ASTER Products (LPDAAC) (38) National Phenology Network (USANPN) (29) 40,000 Records Simple Pilot Catalog Interface (searches entire metadata record)

25 25 Existing cyberinfrastructure: enabling tools

26 Building new global cyberinfrastructure

27 Member Nodes diverse institutions serve local community provide resources for managing their data New distributed framework Coordinating Nodes retain complete metadata catalog subset of all data perform basic indexing provide network-wide services ensure data availability (preservation) provide replication services Flexible, scalable, sustainable network

28 Building new global cyberinfrastructure

29 29 New partnerships  libraries & digital libraries  academic institutions  research networks  NSF- and government-funded synthesis & supercomputer centers/networks  government agencies  international organizations  data and metadata archives  professional societies  NGOs  commercial sector

30 Building global communities of practice… creating long-lived cyberinfrastructure enterprises  Community engagement  Involve cultural memory organizations  Include science educators  Engage citizens and new generations of students in best practices  Build on existing programs  Transparent, participatory governance  Adoption/creation of innovative business

31 Career Long Learning: best practice guides exemplary data management plans podcasts, web-casts workshops and seminars downloadable curricula Education and training Best Practice Guide How to Cite Your Data 6 in a series Best Practice Guide Using Metadata for e-research 5 in a series Gold Star Data Management Plan Here’s How Best Practice Guide How to Cite Your Data 6 in a series

32 www.CitizenScience.org Engaging citizens in science

33 Individual participation Individuals can:  contribute data and metadata  participate in Working Groups  join the software development community  partake of on-line learning opportunities  develop curricula and training materials  become a DIUG member (DataONE International Users Group) Byron Kim, Berkeley, 2003

34 Organizational participation  become a DataONE Member Node  receive data-life-cycle software  access to training materials  join in establishing standards  join the DataONE International Users Group  set future directions  join the software development community  contribute curricula and training materials Libraries, research networks, agencies can:

35 Summing up…  Complexities of global change and challenging data environment  DataONE will:  Create technical infrastructure  Develop organizational infrastructure  Engage the community  Meet the data challenges  Dispersed data: facilitating discovery  Data diversity: providing tools to manage data  Data practice: developing an informatics literate community  Data loss: developing infrastructure

36 We welcome your involvement! DataONE Management Team: William Michener, PI, University of New Mexico Suzie Allard – University of Tennessee Bob Cook – Oak Ridge National Laboratory DAAC Patricia Cruse, California Digital Library Mike Frame – USGS, National Biological Information Infrastructure Viv Hutchinson, USGS, Matt Jones – University of California Santa Barbara Steve Kelling – Cornell Lab of Ornithology Kathleen Smith, UNC DataONE Partners Kepler-CORE SEEK/KNB Teams


Download ppt "Envisioning a New Distributed Organization and Cyberinfrastructure to Enable Science Presenter Name Patricia Cruse, Director, Digital Preservation Program."

Similar presentations


Ads by Google