CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Parallel Programming Models and Paradigms
Introduction SWE 619. Why Is Building Good Software Hard? Large software systems enormously complex  Millions of “moving parts” People expect software.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
What is Concurrent Programming? Maram Bani Younes.
Computer System Architectures Computer System Software
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
1/19 Component Design On-demand Learning Series Software Engineering of Web Application - Principles of Good Component Design Hunan University, Software.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
DOE BER Climate Modeling PI Meeting, Potomac, Maryland, May 12-14, 2014 Funding for this study was provided by the US Department of Energy, BER Program.
Overview of Recent MCMD Developments Manojkumar Krishnan January CCA Forum Meeting Boulder.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Overview of Recent MCMD Developments Jarek Nieplocha CCA Forum Meeting San Francisco.
Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
SCIRun and SPA integration status Steven G. Parker Ayla Khan Oscar Barney.
Modeling Component-based Software Systems with UML 2.0 George T. Edwards Jaiganesh Balasubramanian Arvind S. Krishna Vanderbilt University Nashville, TN.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
NIH Resource for Biomolecular Modeling and Bioinformatics Beckman Institute, UIUC NAMD Development Goals L.V. (Sanjay) Kale Professor.
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.
A Software Framework for Distributed Services Michael M. McKerns and Michael A.G. Aivazis California Institute of Technology, Pasadena, CA Introduction.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Application Software System Software.
Progress on Component-Based Subsurface Simulation I: Smooth Particle Hydrodynamics Bruce Palmer Pacific Northwest National Laboratory Richland, WA.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.
Chapter : 9 Architectural Design
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Parallel Computing Presented by Justin Reschke
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Reference Implementation of the High Performance Debugging (HPD) Standard Kevin London ( ) Shirley Browne ( ) Robert.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
COMPUTATIONAL MODELS.
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
Toward a Unified HPC and Big Data Runtime
CS 584.
An Introduction to Software Architecture
Multithreaded Programming
Professor Ioana Banicescu CSE 8843
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
Presented By: Darlene Banta
From Use Cases to Implementation
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues

CCA Common Component Architecture 2 Motivation The challenges in developing large-scale applications are … –Addressing complexity Improve productivity –Scaling to massive number of processors How applications can exploit the massive amount of parallelism available in teraflop and petaflop-scale systems

CCA Common Component Architecture 3 Multilevel Parallelism in Computational Chemistry: Our Approach Proposed solution to improve scalability –Increase granularity of computation => improve the overall scalability. –Exploitation of multiple levels of parallelism (MLP) Instead of execution entire application on the full set of processors, assign parts of application to appropriately-sized subsets of processors Many apps qualify –Challenge: Difficult to implement Use advanced tools to address programming complexity Common Component Architecture (CCA) Global Arrays (GA) shared-memory programming model Objective: To demonstrate how CCA and GA can be used together to address requirements of real scientific applications

CCA Common Component Architecture 4 Technology Technologies for exploiting multiple level parallelism –Global Arrays (GA) shared-memory programming model High level parallel data management abstractions –Common Component Architecture (CCA) Component technology for HPC applications Hiding complexity Enables composition of software modules written in different languages and programming styles Driver Gradient Energy CCA QM

CCA Common Component Architecture 5 Multiple Component Multiple Data Model Introducing Multiple Component Multiple Data –i.e. multiple program multiple data (MPMD) model in context of CCA –instantiating components on subgroups of processors –create a dynamic environment to partition computational resources and manage them to execute the overall application effectively Facilitate dynamic behavior of the application itself for example –Resizing processor groups based on memory requirements or scaling characteristics –swapping components based on numerical or computational performance

CCA Common Component Architecture 6 Numerical Hessian Example Numerical Hessian Algorithm –determination of energy second derivatives through numerical differentiation of gradients, which may in turn be obtained from numerical differentiation of energies Multiple gradient calculations –Each gradient has multiple energy calculations limited scalability Not effectively utilizing variable degrees of parallelism Gradient Energy Hessian

CCA Common Component Architecture 7 Numerical Hessian Scalability - I Single Energy Calculation Single energy calculation does not scale beyond 4 processes* Two-level Parallelism –Native parallel code – Energy level –group-based energy calculations at gradient level using GA processor groups QM Gradient Energy

CCA Common Component Architecture 8 Multilevel Parallelism Combining SPMD and MPMD paradigms MCMD – Multi Component Multiple Data MPMD + Component The MCMD Driver launches multiple instances of NWChem QM components on subsets of processors (CCA) Each NWChem QM (gradient) component does multiple energy computations on subgroups (GA) MCMD Hessian Driver Go cProps ModelFactory NWChem_QM_1 ModelFacto ry cProps Param Port Energy NWChem_QM_0 ModelFacto ry cProps Param Port NWChem_QM_2 ModelFacto ry cProps Param Port NWChem_QM_n ModelFacto ry cProps Param Port Driver Gradient Energy CCA QM Gradient Energy

CCA Common Component Architecture 9 Multiple Component Multiple Data (CCA’s MCMD Model) MCMD Driver Go cProps ModelFactory Builder Builder Service Builder cProps QM_0 ModelFact ory cProps Parameter QM_0 ModelFact ory cProps Parameter QM_0 ModelFact ory cProps Parameter QM_0 ModelFact ory cProps Parameter MCMD Driver Create new components Create processor groups Assign processor groups to components Connect components Collect results Energy Collect Results

CCA Common Component Architecture 10 Numerical Hessian Scalability - II Application efficiency improved 10x times on 256 CPUs Three-level ParallelismThree-level Parallelism Energy-Level –Native parallel code Gradient-Level –group-based single energy calculations using GA groups Hessian Level –Task-based gradient calculations using CCA

CCA Common Component Architecture 11 Potential Applications Relevant To This Approach Molecular Dynamics Monte Carlo –Growth nucleation Numerical Hessians –Vibrational spectra Optimization techniques –Simulated annealing with local optimization Nudged Elastic Band methods –Determine reaction path for kinetic rates Trajectory simulations

CCA Common Component Architecture 12 MCMD Programming Multi-level parallelism –Nested parallel decomposition –Possibly multiple levels of parallelism –Multiple parallel simulations are run concurrently in a coupled fashion, exchanging data at boundaries or perhaps even within volumes.

CCA Common Component Architecture 13 MCMD Services Develop MCMD services to support MLP –Creating and management of processor groups CCA Represenation for Groups id, membership –Mapping of component to groups and their coordination Coordination of concurrent and nested SCMD/MCMD tasks –Communication between groups –Dynamic reconfiguration –Handling termination of processor groups, components MCMD as a service or a component ?

CCA Common Component Architecture 14 Activities Year 1: –Develop a model to express Multi-level parallelism through processor groups –Requirements gathering and design of flexible dynamic multi-level parallelism model –Coordinate & interact with other initiatives (ongoing) Year 2 –Define a CCA Standard way of specifying and translating processor group membership and mapping between components Year 3, 4, 5. –…

CCA Common Component Architecture 15 Implications of MCMD for CCA model Model for Applications with Multi-Level Parallelism – Important Process group abstraction – compatible with MPI, PVM, GA, GAS languages, HPCS languages (?) –MPI as default ? Group translators –How to address threaded components? OpenMP? Pthreads? Processor group for a threaded component? Group-awareness to CCA and a CCA way of naming groups –i.e. multi-level parallelism at the CCA level/BuilderService

CCA Common Component Architecture 16 Implications of MCMD for CCA Implementations Processor group management Run-time configuration –At run-time, user should be able to blow-up connections, create components and assign groups –Swapping components,.. Mapping communicators Overlapping/Disjoint processor groups

CCA Common Component Architecture 17 Summary - Found MCMD Effective Implemented a flexible, multi-level software architecture for computational chemistry applications –Exploits variable levels of parallelism –A order of magnitude of performance improvement Hiding complexity and enabling better s/w composition MCMD model has potential for addressing scalability in future large scale systems More work is needed in CCA infrastructure and s/w to take advantage for larger class of apps –Facilitate dynamic groups Make MCMD easier to adopt for apps