Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cyberinfrastructure for the 21st Century (CIF21): Data MRI and STCI

Similar presentations


Presentation on theme: "Cyberinfrastructure for the 21st Century (CIF21): Data MRI and STCI"— Presentation transcript:

1 Cyberinfrastructure for the 21st Century (CIF21): Data MRI and STCI
EarthCube CASC Sept 9, 2011 Rob Pennington Office of Cyberinfrastructure (OCI) National Science Foundation 1

2 Framing the Challenge: Science and Society Transformed by Data
Modern science Data- and compute-intensive Integrative, multiscale Multi-disciplinary Collaborations for Complexity Individuals, groups, teams, communities Sea of Data Age of Observation Distributed, central repositories, sensor- driven, diverse, etc

3 Advisory Committee for Cyberinfrastructure Task Force Reports
More than 25 workshops and Birds of a Feather sessions and more than 1300 people involved Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure (ACCI) Dec 2010 Final reports on-line at: Campus Bridging Data and Viz HPC HIGH P ERFORMANCE COMPUTING Grand Challenges Cyberlearning Software

4 Data Task Force Recommendations
Infrastructure: Recognize data infrastructure and services (including visualization) as essential long term research assets fundamental to today’s science Economic sustainability: Develop realistic cost models to underpin institutional/national business plans for research repositories/data services Culture Change: Emphasize expectations for data sharing; support the establishment of new citation models in which data and software tool providers and developers are recognized and credited with their contributions Data Management Guidelines: Identify and share best-practices for the critical areas of data management Ethics and IP: Train researchers in privacy-preserving data access

5 Evolution of Cyberinfrastructure for the 21st Century (CIF21) and Data
National Science Board (NSB) On-going input Science & Engineering Research + Cyberinfrastructure ACCI Data Task Force NSF CIF21 Data Programs DataNet Program Community Input

6 Cyberinfrastructure Ecosystem (CIF21)
Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Discovery Collaboration Education Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication Maintainability, sustainability, and extensibility

7 CIF21: Four Major Thrust Areas
Organizations Universities, schools Government labs, agencies Research and Medical Centers Libraries, Museums Virtual Organizations Communities Community Research Networks Expertise Research and Scholarship Education Learning and Workforce Development Interoperability and operations Cyberscience Scientific Instruments Large Facilities, MREFCs,,telescopes Colliders, shake Tables Sensor Arrays - Ocean, environment, weather, buildings, climate. etc Data-Enabled Science Discovery Collaboration Education Education: integral and embedded Data Databases, Data repositories Collections and Libraries Data Access; storage, navigation management, mining tools, curation, privacy Computational Resources Supercomputers Clouds, Grids, Clusters Visualization Compute services Data Centers New Computational Resources Access and Connections to CI Resources Networking Campus, national, international networks Research and experimental networks End-to-end throughput Cybersecurity Software Applications, middleware Software development and support Cybersecurity: access, authorization, authentication

8 Scientific Data Challenges
Square Kilometer Array Climate, Environment Exa Bytes Peta Tera Giga Volume Genomics Bytes per day Useful Lifetime Climate, Environment TeraGrid, Blue Waters LHC LHC LSST DataNet Distribution Genomics Many smaller datasets… Data Access

9 CIF21 Data Goals Support data intensive and multi-disciplinary science
Provide reliable digital access, integration, management and preservation capabilities for science and engineering data over a decades-long timeline Develop innovative data analysis and mining tools to support data manipulation, modeling, and discovery Engage at the frontiers of technological innovation and transformative science to drive the leading edge forward

10 DataNet Role in CIF21 DataNet is a strategic part of Foundation-wide investments in data in CIF21 Focus on center–scale awards DataNet efforts effectively balance: Production infrastructure to provide operational services Research to create next generation infrastructure DataNet awards are partnerships Responsive to user communities to define their meaningful and useful scope Form a coordinated network to provide national, interdisciplinary data models and infrastructure

11 DataNet: A Multi-tiered and Multi-Disciplinary Landscape
Modeling and Simulation Communities Population, Climate, Environment Communities Data-enabled Science Genomics Communities Data Curation Data Storage DataNet supported

12 Data Storage National storage infrastructure for scientific data
Accommodate scale and heterogeneity of scientific data through robust, open, and broadly accepted standards Sustainable cost model that can be implemented with governmental, academic, non profit, and commercial stakeholders such that it is sustainable. Make strategic investments that: Leverage existing resources in TeraGrid, commercial clouds, federal data centers Meet growing capacity needs at optimum cost Provide coordinating and integrative functions for integrity, access control, availability, persistence Catalyze a national data infrastructure in a similar role that NSFNet played in Internet

13 Data Curation Sustainable, community-based networks for management of critical scientific data resources in a life-cycle context. Overcome challenges of culture change, policy development and implementation, sustainable operations, quality and usability control. Strategic awards that address heterogeneity in formats, complexity, semantics of data collections that are valued by science communities of significant breadth. Operate as a network of data services that promote interoperability, multidisciplinarity, and scalability.

14 Data Enabled Science Provide critical tools and services for data mining, integration, analysis, modeling and visualization. Overcome barriers to scaling, synthesis, and interoperability to promote effective use of large scale, shared data resources. Strategic investments that concentrate tools, resources and expertise in support of compelling grand challenge science questions.

15 Cross Cutting Challenges
Balancing research into next generations of infrastructure with operation & maintenance of current capacity. Stimulate innovation and manage transitions Sustainable, long term programs Technical design, development of business models, and integration with the research cycle. Integration Vertical – Linking low-level bit storage infrastructure to data collections, and finally to applications Horizontal– Achieving connectivity and interoperability between activities that vary in scale, disciplinarity, and funding source.

16 DataNet Program Management
Life cycle perspective covering the use of the data Research, development, implementation, operations, sustainability, close-out Apply project management methods WBS, risk management, change control, schedule, milestones, deliverables Standardized process: Evaluate science merit, conceptual design Develop draft PEP, design and reporting metrics. Critical review – prototype, finalize baseline (approval/mid-course correction/off-ramp) Implementation & operations – subject to change control, oversight based on milestones & metrics Final operational review – informs decision for renewal, termination.

17 DataNet Federation Consortium Data Driven Science
Implement national data grid Federate existing discipline-specific data management systems to enable national research collaborations Enable collaborative research on shared data collections Manage collection life cycle as the user community broadens Integrate “live” research data into education initiatives Enable student research participation through control policies Project Shared Collection Processing Pipeline Digital Library Reference Collection Federation Collection Life Cycle Science and Engineering Initiatives: Ocean Observatories Initiative the iPlant Collaborative CUAHSI CIBER-U Odum Social Science Institute Temporal Dynamics of Learning Center Cyber-infrastructure Partners: Univ. of North Carolina, Chapel Hill Univ. of California, San Diego Arizona State University Drexel University Duke University University of Arizona University of South Carolina Policy-based data management National Science Foundation Cooperative Agreement: OCI

18 MRI 2011 CUNY SI: Instrumentation for Enabling Data Analysis, Sharing, Storage, and Preservation UC Boulder: Acquisition of a Scalable Petascale Storage Infrastructure for Data-Collections and Data-Intensive Discovery RPI: Acquisition of a Balanced Environment for Simulation NCA&T: Acquisition of a Complete High-Performance Modeling and Visualization System for Research in Mathematical Biology and Mathematical Geosciences OSU: Acquisition of a High Performance Compute Cluster for Multidisciplinary Research

19 What is EarthCube?

20 A Call to Action Over the next decade, the geosciences community commits to developing a framework to understand and predict responses of the Earth as a system—from the space-atmosphere boundary to the Earth’s core, including the influences of humans and ecosystems Transitions and Tipping Points in Complex Environmental Systems, NSF AC for Environmental Research and Education, 2009 Earth Science and Applications from Space: National Imperatives for the Next Decade and Beyond, 2007 High-Performance Computing Requirements for the Computational Solid Earth Sciences, 2005

21 Goal of EarthCube To transform the conduct of research in geosciences by supporting community-based cyberinfrastructure to integrate data and information for knowledge management across the Geosciences.

22 What Needs To Be Done? Integrate data, tools and communities through cyberinfrastructure Establish a governance mechanism that is inclusive and adopted by the community Utilize current and emerging technologies to create transparent infrastructure for the geosciences community

23 Convergence to a Unifying Architecture
Modes of Support Well-Connected through EarthCube Loosely or Not Connected This is an iterative process

24 EARTHCUBE ASSUMPTIONS
The geosciences community is ready to take on the EarthCube challenge Community will start self-organizing prior to EarthCube activities, like the Nov 1-4 Charrette Current and emerging technology will help achieve the convergence envisioned for EarthCube A broad range of expertise and resources must be engaged to shape EarthCube

25 Developed through EAGERs
Proposed Framework Approaches Developed through EAGERs DCL Released Two WebEx events Sandpit/ IdeasLab to determine 18 mo. prototype award(s) Charrette ROB Jun 2011 Nov May 2012 Jul-Sept 2011 Nov/11-Apr/12

26 EARTHCUBE TIMELINE Prototype Development: May to December 2013
On-line Community Information: August to November, 2011 EarthCube Charrette: Early November, 2011 EarthCube Ideas/Lab: Tentatively Early May, 2012 Prototype Development: May to December 2013 Fully integrated geosciences infrastructure:

27 Pre-Charrette Organization (August – September)
Second WebEx on Aug. 22 NSF seeks input from wide range of sources Individuals, inst./org., representatives of scientific groups or communities Facilities and managers of CI endeavors Industry, Federal Labs., Federal Agencies, and International Partners NSF will establish on-line resources and forums to Gather community inputs/requirements Facilitate partnerships and collaborations Encourage submission of approaches to the EarthCube design

28 Charrette Process Stakeholders focus EarthCube Ideas and Activities
Plenary Sessions to discuss user requirements refine approaches and designs for EarthCube develop partnerships and new collaborations Remote participation and real-time comments system will be available Summary Session Comments from NSF, facilitators, and participants on process NSF provides guidance on post-Charrette activities

29 Questions?


Download ppt "Cyberinfrastructure for the 21st Century (CIF21): Data MRI and STCI"

Similar presentations


Ads by Google