Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University

Slides:



Advertisements
Similar presentations
21 st Century Science and Education for Global Economic Competition William Y.B. Chang Director, NSF Beijing Office NATIONAL SCIENCE FOUNDATION.
Advertisements

© Fraunhofer Institute SCAI and other members of the SIMDAT consortium Data Grids for Process and Product Development using Numerical Simulation and Knowledge.
The Data Conservancy: A Digital Research and Curation Virtual Organization D4Science World User Meeting November 25, 2009.
Data Conservancy and the US NSF DataNet Initiative 2010 JISC/CNI Conference July 1, 2010 Sayeed Choudhury Johns Hopkins University.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Scoping Research in Sustainability Information Science Steven D. Prager Department of Geography University of Wyoming David Bennett Department of Geography.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
Introduction to Research Data Management Services, January 2013 Library Data Services Functions and activities.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Libraries in the New Research Environment Joyce Ray NAS/BRDI Symposium Associate Deputy for Libraries June 3, 2010.
To facilitate readily accessible research infrastructure data to advance our understanding of Earth systems through an international community-driven effort,
DataNet Federation: Data Conservancy Research Data Access and Preservation Summit April 9, 2010.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
The "Earth Cube” Towards a National Data Infrastructure for Earth System Science Presentation at WebEx Meeting July 11, 2011.
Social and behavioral scientists building cyberinfrastructure David W. Lightfoot Assistant Director, National Science Foundation Social, Behavior & Economic.
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Crosscutting Concepts and Disciplinary Core Ideas February24, 2012 Heidi Schweingruber Deputy Director, Board on Science Education, NRC/NAS.
Transforming Data-Driven Publications and Decision Support Joan L. Aron, Ph.D. Consultant Federal Big Data Working Group COM.BigData 2014.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Advances in Cyberinfrastructure with a Focus on Data: a U.S. National Science Foundation Overview Alliance for Permanent Access to Records of Science in.
Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University
The Data Conservancy: Lessons from Astronomy Third Workshop on Data Preservation and Long Term Analysis in HEP December 7, 2009.
The Data Conservancy: A Digital Research and Curation Virtual Organization Karon Kelly National Center for Atmospheric Research – NCAR Library Special.
A River Runs Through It ARL Membership Meeting Sayeed Choudhury Sheridan Libraries, Johns Hopkins October 15, 2009.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
ESIP Federation: Connecting Communities for Advancing Data, Systems, Human & Organizational Interoperability November 22, 2013 Carol Meyer Executive Director.
Data Life Cycle 2 Project Partners DCERC Funding Provided By References: [1] Graduate School of Library and Information Science. Data Curation Education.
Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites.
Data Infrastructure Services for Data Curation Jian Qin School of Information Studies Syracuse University Syracuse, New York ALA 2015, San Francisco, CA.
Chapter 6: Integrating Knowledge and Action Scott Kaminski ME / 9 / 2005.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Data Curation Issues and Challenges ARL/CNI Fall Forum 2008 Sayeed Choudhury
“A Library outranks any other one thing a community can do to benefit its people.” --Andrew Carnegie.
Site-Based Data Curation at Yellowstone National Park PI: Carole L. Palmer, GSLIS, CIRSS Co-PIs: Bruce Fouke, Geology, Microbiology, Institute for Genomic.
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Computational Tools for Population Biology Tanya Berger-Wolf, Computer Science, UIC; Daniel Rubenstein, Ecology and Evolutionary Biology, Princeton; Jared.
Open Access from Digital Library Viewpoint Berlin 7 Conference Sayeed Choudhury December 4, 2009.
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
1 e-Arts and Humanities Scoping an e-Science Agenda Sheila Anderson Arts and Humanities Data Service Arts and Humanities e-Science Support Centre King’s.
Data Conservancy and the US NSF DataNet Initiative Fourth Workshop on Data Preservation and Long-Term Analysis in HEP Sayeed Choudhury Johns Hopkins University.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
CNI Task Force Meeting April 7, 2008 OAI-ORE Project Briefing David Reynolds Tim DiLauro Sayeed Choudhury Library Digital Programs Sheridan Libraries Johns.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
The Global Scene Wouter Los University of Amsterdam The Netherlands.
Connecting Users, Data & Data Repositories Simon J. Goring ORCID: John W. Williams doi: /m9.figshare Distinguished Lecture.
Data Infrastructure Building Blocks (DIBBS) NSF Solicitation Webinar -- March 3, 2016 Amy Walton, Program Director Advanced Cyberinfrastructure.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Institutional Repositories: The Beginning of the Journey Sayeed Choudhury Utah State IR Conference September 30, 2009.
EarthCube Sustaining the Geosciences for 21 st Century Challenges Credits: from top to bottom: NOAA Okeanos Explorer Program (CC BY-SA 2.0), NASA/Kathryn.
Joslynn Lee – Data Science Educator
PV 2009 December 3, 2009 The Data Conservancy: Building Sustainable Infrastructure for Interdisciplinary Scientific Data Curation and Preservation.
NSDL: A New Tool for Teaching and Learning.
Packaging Specification Package Ingest Service
DataNet Collaboration
Research on Data Curation and Repositories
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Bird of Feather Session
Process Wind Tunnel for Improving Business Processes
Presentation transcript:

Data Conservancy: A Life Sciences Perspective Sayeed Choudhury Johns Hopkins University

Data Conservancy One of two current awards through the National Science Foundation DataNet program Other award is DataONE led by William Michener at University of New Mexico Each award is $20 million, 5 year award with multiple partners

Data Curation The Data Conservancy embraces a shared vision: data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society.

…not a rigid road map but principles of navigation. There is no one way to design cyberinfrastructure, but there are tools we can teach the designers to help them appreciate the true size of the solution space – which is often much larger than they may think, if they are tied into technical fixes for all problems.

Objectives Infrastructure research and development – Technical requirements Information science and computer science research – Scientific or user requirements Broader impacts – Educational requirements Sustainability – Business requirements

What are Life Sciences?

Long Tail of Biology Small number of providers with lots of data. High-throughput biology, monitoring, simulation. Large number of providers with small amounts of data. Observational, experimental

21 st Century Biology Molecular biology drivers and promise of informatics Fundamental unity of biology Data generated within one domain can also serve another System that captures most possible value Add higher level thinking within the discipline

Data Driven Discovery Discoveries made from aggregating data and querying in new ways Need data management tools Need data analysis tools Need data visualization tools

The Problem How do we make data sharing part of the normal work flow across the Life Sciences? – Social – Technical Address barriers Accommodate needs Do this for all Life Science sub-disciplines!

For each Life Science Sub-discipline Data culture Data policy Metadata standards Ontologies How to address each sub-discipline in four years?

Pixel data collected by telescope Sent to Fermilab for processing Beowulf Cluster produces catalog Loaded in a SQL database Data Flow (Levels of Data) 12

Domain coverage/methods Multi-site user research methods are a blend of: – Case study & domain comparisons – Depth & breadth – Local & global AstronomyEarth SciencesLife SciencesSocial Sciences UCAR Task-based design and usability testing  Use cases, data requirements, system recommendations UCAR UCLAEthnography, virtual ethnography, oral histories  Use cases, data requirements Interviews, Surveys, Worksheets, Content analysis  Curation requirements, taxonomy, metadata/provenance framework UIUC

Data Framework Start with a common conceptualization that applies across scientific domains Exploit semantic technologies Leverage existing work Prototype the framework in target communities – Iteratively refine, learn from experience – Demonstrate success, measured in terms of new science

Common Conceptualization Observations are the foundation of all scientific studies, and are the closest approximation to facts. Wiens, J. A. (1992). Cambridge studies in ecology: The ecology of bird communities. Foundations and Patterns, 1; Processes and Variations, 2

Emergence Emergence: The Connected Lives of Ants, Brains, Cities, and Software by Steven Johnson The movement from low-level rules to higher- level sophistication is what we call emergence.

Data Model using OAI-ORE

Acknowledgements Anne Thessen and David Patterson (Life sciences slides) Alex Szalay (Data Flow) Carole Palmer (Domain coverage/methods slides) Carl Lagoze (Data Framework slides) Tim DiLauro (OAI-ORE) NLG grant award LG Office of Cyberinfrastructure DataNet Award #