CCA Forum Spring Meeting1 19-20 April 20071 CCA Common Component Architecture Fault Tolerance and the Common Component Architecture David E. Bernholdt.

Slides:



Advertisements
Similar presentations
Presented by Fault Tolerance and Dynamic Process Control Working Group Richard L Graham.
Advertisements

MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Presented by Fault Tolerance Challenges and Solutions Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported.
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Critical Software Security Through Replication and Virtualization A Research Proposal Dennis Edwards Sharon Simmons Arangamanikkannan Manickam.
The InterComm-based CCA MxN components Hassan Afzal Alan Sussman University of Maryland.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
A 100,000 Ways to Fa Al Geist Computer Science and Mathematics Division Oak Ridge National Laboratory July 9, 2002 Fast-OS Workshop Advanced Scientific.
The Role of DANSE at SNS Steve Miller Scientific Computing Group Leader January 22, 2007.
Center for Component Technology for Terascale Simulation Software (aka Common Component Architecture) (aka CCA) Rob Armstrong & the CCA Working Group Sandia.
CCA Forum Fall Meeting October CCA Common Component Architecture Update on TASCS Component Technology Initiatives CCA Fall Meeting October.
1 Jack Dongarra University of Tennesseehttp://
A View from the Top November Dallas TX. Coordinator: Al Geist Participating Organizations ORNL ANL LBNL PNNL PSC.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder.
CCA Common Component Architecture CCA Forum Tutorial Working Group Welcome to the Common.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
CCA Common Component Architecture CCA Forum Tutorial Working Group Welcome to the Common.
1/20 Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications Sheng Di, Mohamed Slim Bouguerra, Leonardo Bautista-gomez, Franck Cappello.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
Tech-X Corporation for CCA Overview Tech-X Corporation is an entrepreneurial and dynamic enterprise committed to scientific and technical excellence and.
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.
CiFTS Coordinated Infrastructure for Fault Tolerant Systems.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
CCA Forum Fall Meeting1 5-6 October CCA Common Component Architecture cca-forum.org Server Migration David E. Bernholdt ORNL.
SCIRun and SPA integration status Steven G. Parker Ayla Khan Oscar Barney.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
CIFTS Coordinated Infrastructure for Fault Tolerant Systems.
CCA Common Component Architecture CCA Forum Tutorial Working Group CCA Status and Plans.
Distributed Components for Integrating Large- Scale High Performance Computing Applications Nanbor Wang, Roopa Pundaleeka and Johan Carlsson
11 CCA Common Component Architecture CCA Forum Fall 2008 Meeting21-22 October 2008 Upcoming SciDAC Reviews David E. Bernholdt TASCS Lead PI.
BioPSE NCRR SCIRun2 -THE PROJECT -OBJECTIVES -DEVELOPMENTS -TODAY -THE FUTURE.
Xolotl: A New Plasma Facing Component Simulator Scott Forest Hull II Jr. Software Developer Oak Ridge National Laboratory
CCA Forum Winter Meeting January CCA Common Component Architecture CCA Forum Collaboration Services Migration Update David E. Bernholdt ORNL.
ComPASS Summary, Budgets & Discussion Panagiotis Spentzouris, Fermilab ComPASS PI.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
REDHAWK Software Defined Radio Framework
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN CCA Status, Code Walkthroughs, and Demonstrations.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN This work has been sponsored by the Mathematics,
Presented by Fault Tolerance Challenges and Solutions Al Geist Network and Cluster Computing Computational Sciences and Mathematics Division Research supported.
VGrADS and GridSolve Asim YarKhan Jack Dongarra, Zhiao Shi, Fengguang Song Innovative Computing Laboratory University of Tennessee VGrADS Workshop – September.
Counting on Failure 10, 9, 8, 7,…,3, 2, 1 Al Geist Computer Science and Mathematics Division Oak Ridge National Laboratory September 12, 2006 CCGSC Conference.
Enabling Grids for E-sciencE University of Perugia Computational Chemistry status report EGAAP Meeting – 21 rst April 2005 Athens, Greece.
Lizhe Wang, Gregor von Laszewski, Jai Dayal, Thomas R. Furlani
VisIt Project Overview
SciSys SOIS Prototyping Activities
EGEE Middleware Activities Overview
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
Jack Dongarra University of Tennessee
Establishing RD&D Foundation for Smart Grid Center: Reference Design for Residential Information Gateways David M. Auslander Mechanical Engineering, UC.
FF-LYNX (*): Fast and Flexible Electrical Links for Data Acquisition and Distribution of timing, trigger and control signals in future High Energy Physics.
Scalable Systems Software for Terascale Computer Centers
I590 Data Science Curriculum August
Gabor Madl Ph.D. Candidate, UC Irvine Advisor: Nikil Dutt
Composing Time- and Event-driven Distributed Real-time Systems
HPC Modeling of the Power Grid
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
What’s New from Platform Computing
Scheduled Accomplishments
Middleware for Grid Portal Development
Utility-Function based Resource Allocation for Adaptable Applications in Dynamic, Distributed Real-Time Systems Presenter: David Fleeman {
Department of Intelligent Systems Engineering
$1M a year for 5 years; 7 institutions Active:
SEAL Project Core Libraries and Services
Presentation transcript:

CCA Forum Spring Meeting April CCA Common Component Architecture Fault Tolerance and the Common Component Architecture David E. Bernholdt ORNL

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture Center for Improvement of Fault Tolerance in Systems (CIFTS) Participants: Institution – PI –ANL – Beckman (Lead PI) –Indiana U – Lumsdaine –LBNL – Hargrove –Ohio State U – Panda –ORNL – Geist –U Tennessee – Dongarra Submitted as SciDAC 2 CET Funded as base program Also known as FOBAWS, Faulty

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture Fault Tolerance Backplane (FTB) The core idea of CIFTS Event service to convey fault information throughout the software stack and the machine –Hardware sensors to OS/runtime to libraries to applications FTB components may generate or consume fault-related events –Prediction, adaptation, response

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture Planned Areas of Activity ANL (Beckman) –Parallel file systems –MPI, MPI-IO –Linux –Scheduler/resource manager Indiana U (Lumsdaine) –MPI, MPI-IO LBNL (Hargrove) –Checkpoint/restart Ohio State U (Panda) –Interconnect ORNL (Geist) –CCA integration –Applications (chemistry, fusion) U Tennessee (Dongarra) –Scalapack, math libraries

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture CCA Integration CCA components should be able to consume or generate FTB events –Adapter between FTB and CCA event service Main focus –CCA applications will be consumers of FTB information Interesting secondary possibilities –Allow CCA components to plug into FTB as to provide adaptation/response services –FTB could be derived from CCA event service

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture Summer Plans MCMD architecture for SWIM Integrated Plasma Simulator (IPS) to be developed this summer by Samantha Foley (IU) –Not CCA-compliant, but will provide use case and prototype implementation for CCA MCMD discussions Demonstration of FT in MCMD IPS to be developed by Aniruddha Shet –Focus on a few simple MCMD-relevant events –Use IPS event service (FTB not designed)

CCA Forum Spring Meeting April 2007 CCA Common Component Architecture Discussion: What FT Events are Relevant to CCA? Presumably many FT events are interesting to CCA applications, utility components What FT events are relevant to CCA itself? –framework parallel distributed –services –…