NationalDataService.org National Data Service Consortium Second Meeting Ed Seidel University of Illinois Urbana-Champaign.

Slides:



Advertisements
Similar presentations
The Internet2 NET+ Services Program Jerry Grochow Interim Vice President CSG January, 2012.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Presentation at WebEx Meeting June 15,  Context  Challenge  Anticipated Outcomes  Framework  Timeline & Guidance  Comment and Questions.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21) NSF-wide Cyberinfrastructure Vision People, Sustainability, Innovation,
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Using Sakai to Support eScience Sakai Conference June 12-14, 2007 Sayeed Choudhury Tim DiLauro, Jim Martino, Elliot Metsger, Mark Patton and David Reynolds.
14 July 2000TWIST George Brett NLANR Distributed Applications Support Team (NCSA/UIUC)
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
Assessment of Core Services provided to USLHC by OSG.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign 21 st Century Research and Education Major Challenges for Universities.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Managing Data: The Long View FORCE15 – 12 January 2015 Amy Friedlander, Ph.D.
The Department of Energy’s Public Access Solution Giving Voice to Energy and Science R&D Results Jeffrey Salmon Deputy Director for Resource Management.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Research Data Management Victoria University Context Lyle Winton Adrian Gallagher Julie Gardner.
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
DASISH Final Conference Common Solutions to Common Problems.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The Swiss Grid Initiative Context and Initiation Work by CSCS Peter Kunszt, CSCS.
The Role of Academic Libraries in the Digital Data Universe Break-Out Session: New Partnership Models Bob Hanisch and Brian Schottlaender Co-Leaders ARL.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Shaping a Health Statistics Vision for the 21 st Century 2002 NCHS Data Users Conference 16 July 2002 Daniel J. Friedman, PhD Massachusetts Department.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Midwest Big Data Hub Edward Seidel Director, NCSA Founder Prof. of Physics, Prof of Astronomy On behalf of the Midwest Big Data Hub 1 Brian Athey Sarah.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
International Planetary Data Alliance Registry Project Update September 16, 2011.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of
Bringing visibility to food security data results: harvests of PRAGMA and RDA Quan (Gabriel) Zhou, Venice Juanillas Ramil Mauleon, Jason Haga, Inna Kouper,
National Data Service Consortium Development and Outreach Activities
TRSS Terminology Registry Scoping Study
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Tools and Services Workshop
Joslynn Lee – Data Science Educator
What is the National Data Service?
Summit 2017 Breakout Group 2: Data Management (DM)
Access  Discovery  Compliance  Identification  Preservation
Jisc Research Data Shared Service (RDSS)
Bird of Feather Session
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

NationalDataService.org National Data Service Consortium Second Meeting Ed Seidel University of Illinois Urbana-Champaign

NationalDataService.org Data-enabled Transformation of Science Astronomy : Single scientist looks through telescope Record KB of data in notebook Require reproducibility Sloan Digital Sky Survey Record data for decade (40TB) Serve to entire world Thousands of scientists work “together” DES (now) 200GB/night PB in decade LSST (6 years) Record data for decade SDSS/night! 200 PB/decade How can I publish, discover, verify data in this new world?

NationalDataService.org Big Data vs The Long Tail of Science Many “Big Data” projects are “special” –Highly organized, singular sources of data, professionally curated, a lot attention paid What about the “Long Tail” (the other 99%)? –1000s of biologists sequencing communities of organisms –Thousands of chemists and materials scientists developing a “materials genome” –Characteristics: Heterogeneous, perhaps hand generated Not curated, reused, served, etc… 3 Fundamental Observation: Scientists communicate by sharing data…

NationalDataService.org Basic Vision for Open Data and Publication Services Make it possible (easy) for anyone to: – Create a data collection and get an “identifier”… – Deposit it somewhere where it can be kept safe… – Provide services so others can find it, analyze it, repurpose it… – Link it to traditional (open, please!) publications… OA aspects very important to this With these capabilities in place – Many important things will happen… “We need to take steps to make scientific research data more liquid. The more we move towards open as the default for scientific research data, the more we will get out of the research enterprise. It is time to take deliberate steps to make that a reality.” Mike Stebbins, White House OSTP

NationalDataService.org Why is this so critical for the future? Reproducibility of a scientific result: heart of science – Needed: access to complete state of a result… Accelerating discovery: faster, deeper dissemination of results to other researchers; Repurposing data – Needed: services to find, retrieve, analyze, describe… Interdisciplinarity and complex problem solving – Needed: ability to find, integrate results across communities Public dissemination of publicly funded research results – Needed: open, accessible results, searchable by public Economic development – Needed: availability of all the above to companies (MGI!)

NationalDataService.org NATIONAL DATA SERVICE CONSORTIUM A Builder’s Consortium!

NationalDataService.org What should NDS do for researchers? Help researchers find data – Cross-disciplinary searching: across federations, projects, archives, and other repositories – Find data related to a publication – Allow drill down to leverage specialized community- specific discovery Help researchers use data – Download data, browse metadata, track provenance – Move data to processing platforms for specialized (re-)processing and analysis

NationalDataService.org What should NDS do for researchers? Help researchers share and publish data – Engage researchers early in the publishing process – NDS and federated local/domain repositories sharing privately with collaborators prior to publishing tools to help organize the data for publishing automatically ensure links to literature assign DOIs, provide links to publishers, synchronize data publishing with papers – Recommend appropriate discipline/community repository for long-term preservation NDS Repository as archive of last resort

NationalDataService.org DEVELOPMENTS SINCE NDS-1 IN BOULDER What are we doing and why?

NationalDataService.org Activities First NDS meeting hosted by NCAR in Boulder – ~85 participants from all the above types of organizations General services discussed and with much agreement emerging Specific groups agreed to help pilot use cases for early services (MDF, Astro, other reports here) Since NDS-1 at NCAR… – Interim steering committee formed – OSTP and NDS announce Materials Data Facility (MGI + 3 years) – Hackathon last month at NCSA to explore prototype services, connecting together existing tools – NCSA, SDSC, TACC, ANL have agreed to create federated storage and capabilities for development of services – Demos at SC14; see booths at Globus, NCSA, SDSC, TACC…

NationalDataService.org NDS: an ecosystem in 3 parts The Portal Complete end-to-end set of vanilla national services for storing, sharing, publishing, finding and re-using data The Framework The system into which a community can plug specialized tools, portals, and services The Infrastructure Foundational storage, hosting environment and software that allow communities to build their own specialized data services We’re exploring all three of these with pilot activities now underway

NationalDataService.org The NDS Ecosystem: The Portal Imagine a portal that supports cross-disciplinary research Supports data of any discipline Enables private sharing, publishing, data discovery, data movement Connects researcher with community- specific resources where they exist The Materials Data Facility (MDF) is a prototype for a generic NDS portal A complete end-to-end set of vanilla services for sharing, publishing, finding and re-using data V NDS Login InAbout My Data My Groups Create a collection… Browse… Curate…Publish… Share… Move Data Discover Data Literature Advanced… Community Resources Portals…Repositories… Tools…

NationalDataService.org The NDS Ecosystem: The Framework The framework is how existing and new tools connect together Any generic NDS component should be replaceable with a community specific version User can leverage which ever tools work best for her research group. Based on the “Data Fabric” recommendations from the RDA The system into which a community can plug specialized tools, portals, and services Imagine… 1.A group begins sharing data informally using Dropbox, SciDrive, Figshare, or … 2.They move the data into SEAD to create a publishable collection 3.They combine it with a collection created in the NDS portal 4.The data is published into Dryad for long-term preservation We’re currently exploring this through the Epiphyte Pilot and NDSShare

NationalDataService.org The NDS Ecosystem: Infrastructure Includes Nationally distributed storage – Replication services – Databases – Community repository tools Cloud-based Hosting environment – For hosting portals and services Common data software – For building community-specific capabilities Imagine… Foundational storage, hosting environment and software that allows communities to build their own specialized data services NDSLabs will provide an experimental prototype of NDS Infrastructure NDS Infrastructure Fast Storage Virtual Machines SlowStorage Software (e.g. SciDrive, DataVerse, IRODS, Globus…) The National Transportation and Urban Planning Data Facility

NationalDataService.org NDS Lab and NDS Share NDS Lab – Target: friendly developers – A community support environment for developing, coordinating, deploying prototype service – Spinning disk, storage, virtual machines for developing and hosting services – Available to NDS community members NDS Share (or better name: Kalliope? Help us name it!) – Target: friendly scientists – Experimental platform for sharing data Enable anyone to create data collections, store data, get DOI – Include installations of community data sharing applications – Will evolve over time Partnership between NCSA, ANL, TACC, and SDSC – Other interested partners? Look to make available by January 2015

NationalDataService.org How will NDS relate to other activities? NDS will complement and extend RDA – No attempt to duplicate – Working closely with RDA groups to implement – Progress on NDS as member of RDA Global and documented relation with RDA and other organizations NDS: operating framework for data services, on top of existing activities – Narrow focus on specific functions, leveraging existing capabilities – Create collections, identify, deposit, publish, link… RDA Global RDA US EUDAT RDA EU Campuses… Publishers… NDS Projects…

NationalDataService.org Vision for 6 months and 5 Years Hence Six Months – Clear membership, relationships, governance models – Numerous pilots progressing with coordination – Funding opportunities clearly identified, coordinated Five years – It is routine, and part of culture, to store, publish, share, discover, link data – National structures in place that connect campus, domain, and national services federated with local – Promise of data-enable science really begins!

NationalDataService.org NDS: A Builders Consortium NDS vision requires collaboration of many kinds of institutions – Compute and data services centers Effort spearheaded by UIUC (NCSA, Library), UC/ANL, UT/TACC, UCSD/SDSC – Universities and project repositories Internet2 and numerous members; ARL and members LIGO, IceCube, LSST, DES, etc – Discipline-specific federations E.g., SEAD, DataONE, iPlant, Virtual Observatory, SEAD, ICPSR, HASTAC, … – Publishers Science, Nature, APS, PLOS, IEEE, Elsevier, JORS, et al… NDS Consortium to guide the building, governance of services – Coordinate separately funded efforts to build NDS components Ensure interoperability, integrate existing tools and resources – Interim steering committee formed – Joel Cutcher-Gershenfeld to moderate governance discussions here

NationalDataService.org Activities First NDS meeting hosted by NCAR in Boulder – ~85 participants from all the above types of organizations General services discussed and with much agreement emerging Specific disciplines agreed to help pilot use cases for early services Since NDS-1 at NCAR… – OSTP and NDS announce Materials Data Facility (MGI + 3 years) Stakeholder Map of 200+ groups in progress – Hackathon last month at NCSA to build out prototype services for data collections, storing, DOI minting, linking to publishers – NCSA, SDSC, TACC, ANL have agreed to create federated storage capabilities for development of services – Aiming for demos at SC14; see NCSA, TACC, SDSC, other booths

NationalDataService.org NDS ecosystem in 3 parts Refine these elements here… Layer 1: Complete end-to-end set of vanilla national services for storing, sharing, publishing, finding and re-using data – Components for collaborative sharing, creating collections, archiving, re-use, and linking to literature – General-purpose search system discovers data across disciplines Layer 2: Framework into which a community can plug specialized components: Community-specific… – Federations integrate with NDS by plugging in, say, specific publishing tools that capture specialized metadata – Search tools can reach out to neighboring disciplines by accessing the generic search service Layer 3: Foundational infrastructure that allows communities to build their own specialize data services – Distributed storage, replication services, repository services – Cloud computing services for hosting portals and services

NationalDataService.org NDS ecosystem in 3 parts Refine these elements here… Layer 1: Generic services for storing, discovering, linking… – Today and tomorrow we will see some prototype services for some of these functions – Still need to develop better definition What services are needed Consortium Governance model to determine what is to be supported Layer 2: Framework for community development – NCSA and partners will support NDSLab – Hosting environment for developing community Layer 3 – NCSA and partners will support NDSShare – Storage and computing services

NationalDataService.org NDS Lab and NDS Share NDS Lab – A community support environment for developing, coordinating, deploying prototype service – Spinning disk, storage, virtual machines for developing and hosting services – Available to NDS community members NDS Share – Experimental platform for sharing data Enable anyone to create data collections, store data, get DOI – Include NCSA-based installations of community data sharing applications – Will evolve over time Hoped for Dates Who else agrees to partner in advance

NationalDataService.org National, Federated Data Service(s) Urgent need for national infrastructures for data Extensible, integrated national-scale services – Storing, sharing, finding, verifying, publishing, citing, reusing… Open and federating architecture – Building on the infrastructure currently at discipline/community level – Allow data providers to make data accessible in the national environment – Allow new and community-produced tools and resources to be plugged in