Download presentation
Presentation is loading. Please wait.
Published byJustin Cunningham Modified over 9 years ago
1
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory
2
2 HPC Components Distributed Arrays Component Global Arrays (GA) Parallel I/O Component Disk Resident Arrays (DRA) One-sided Communication Component Remote Memory Access (RMA) Communication Aggregate Remote Memory Copy Interface (ARMCI)
3
3 Distributed Array Component Based on Global Arrays (GA) Core Capabilities dense arrays 1-7 dimensions global rather than per-task view of data structures user control over data distribution: regular and irregular GAClassicPort 36+98 (direct+indirect) GA methods GADADFPort distributed array descriptors (DAD) and templates proposed by Data Working Group of CCA Forum LinearAlgebraPort (LA) manipulating vectors, matrices, and linear solvers (for TAO) physically distributed dense array single, shared data structure global indexing (e.g., A(4,3) rather than buf(7) on task 2) GA LA DAD GA Classic
4
4 Distributed Arrays Data Locality, distribution Ease of programming High performance Gets 5.2 GFLOP/s per CPU out of 6 GFLOP/s peak MPIGA -Invert Data Locally -Identify where (process ranks) to send the data -find # of MPI_Recv’s to post -Manipulate the global indices for each Recv (identify where each data fit locally) -Do the actual data transfer. - Invert Data Locally -Do a GA_Put 012345 1-d transpose (inverse data globally)
5
5 Parallel I/O Component Based on Disk Resident Arrays High-level API for transfer of data between N-dim arrays stored on disk and distributed arrays stored in memory Uses parallel or local filesystems Hides filesystem issues Scalable performance utilizing local disks of a cluster More nodes used – more disks available – higher aggregate b/w Use when Arrays too big to store in core checkpoint/restart out-of-core solvers Development Ohio State collaboration (P. Sadayappan) Non-collective I/O Data reorganization/layout Recent paper at LACSI array in memory array on disk(s)
6
6 Communication Component Based on ARMCI Aggregate Remote Memory Copy Interface Used in Global Arrays, Rice Co-Array Fortran compiler, Ames GPSHMEM, Co-Array Python Vendor supported (Cray XD1, IBM porting to BG/L) One sided communication (put/get model) Remote Memory Access CCA component offers language interoperability Only C interface existed in ARMCI Comm Driver ARMCI Elan (Quadrics) ARMCI GM (Myrinet) ARMCI Vapi (Infiniband) ARMCI Sockets (Ethernet) (Any) Component P1P0 put remote memory access (RMA) 1-sided model A B Plug-and-play for network drivers using CCA
7
7 Processor Group Issues in Distributed Array Management Access to data in components running on different processor groups Identifying the rank of processes/thread and group naming in component interfaces Data movement and Reorganization An instance of MxN problem revisited For component interoperability would like support from framework identifying and naming processes/groups distributed and parallel environments, hybrid Threads/processes, MPI/PVM issues MPI GA Comp AComp B CCA Framework
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.