LLNL-PRES-638575 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
SALSA HPC Group School of Informatics and Computing Indiana University.
9.0 EMBEDDED SOFTWARE DEVELOPMENT TOOLS 9.1 Introduction Application programs are typically developed, compiled, and run on host system Embedded programs.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager ICCD ADH for Advanced Technology Lawrence Livermore.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
MSSG: A Framework for Massive-Scale Semantic Graphs Timothy D. R. Hartley, Umit Catalyurek, Fusun Ozguner, Andy Yoo, Scott Kohn, Keith Henderson Dept.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
VisIt Software Engineering Infrastructure and Release Process LLNL-PRES Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Figure 1.1 Interaction between applications and the operating system.
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
Emalayan Vairavanathan
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
A Comparison of Library Tracking Methods in High Performance Computing Computer System Cluster and Networking Summer Institute 2013 Poster Seminar William.
Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.
PARMON A Comprehensive Cluster Monitoring System A Single System Image Case Study Developer: PARMON Team Centre for Development of Advanced Computing,
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Lawrence Livermore National Laboratory This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory.
Sandor Acs 05/07/
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
SALSA HPC Group School of Informatics and Computing Indiana University.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Blaise Barney, LLNL ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
LLNL-PRES Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work performed under the auspices of the U.S. Department.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
ASC Tri-Lab Code Development Tools Workshop Thursday, July 29, 2010 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA This work.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
Architecture Models. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory ASC STAT Team: Greg Lee, Dong Ahn (LLNL), Dane Gardner (LANL)
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Lawrence Livermore National Laboratory S&T Principal Directorate - Computation Directorate Tools and Scalable Application Preparation Project Computation.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Data-Centric Systems Lab. A Virtual Cloud Computing Provider for Mobile Devices Gonzalo Huerta-Canepa presenter 김영진.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
Five todos when moving an application to distributed HTC.
Compute and Storage For the Farm at Jlab
StoRM: a SRM solution for disk based storage systems
OpenMosix, Open SSI, and LinuxPMI
Diskpool and cloud storage benchmarks used in IT-DSS
VisIt Libsim Update DOE Computer Graphics Forum 2012 Brad Whitlock
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
9.0 EMBEDDED SOFTWARE DEVELOPMENT TOOLS
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA Lawrence Livermore National Security, LLC Scalable Library Loading with SPINDLE CScADS 2013 Matt LeGendre, Wolfgang Frings, Dong Ahn, Todd Gamblin, Bronis de Supinski, Felix Wolf July 12, 2013

Emerging Technologies in HPC Software Development Workshop 2 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Dynamic Linking Causes Major Disruption at Scale  Multi-physics applications at LLNL  848 shared library files  Load time on BG/P: 2k tasks  1 hour 16k tasks  10 hours  Pynamic  LLNL Benchmark  Loads shared libraries and python files  495 shared objects  1.1 GB Pynamic running on LLNL Sierra Cluster 1944 nodes, 12 tasks/node, NFS and Lustre file system

Emerging Technologies in HPC Software Development Workshop 3 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Challenges Arise from File Access Storms  Example: Pynamic Benchmark on Sierra-Cluster  serial (1 task): 5,671 open/stat calls  parallel (23,328 tasks) : 132,293,088 open/stat calls File metadata operations: # of tests = # of processes x # of locations x # of libraries File metadata operations: # of tests = # of processes x # of locations x # of libraries File read operations: # of reads = # of processes x # of libraries File read operations: # of reads = # of processes x # of libraries  Formulas:  Caused by dynamic linker  searching and  loading dynamic linked libraries

Emerging Technologies in HPC Software Development Workshop 4 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle dynamic linker File Access is Uncoordinated!  Loading is nearly unchanged since 1964 (MULTICS)  ld-linux.so uses serial POSIX file operations that are not coordinated among process. dynamic linker … … … lib

Emerging Technologies in HPC Software Development Workshop 5 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle How SPINDLE Works Requesting dir/file: 1. Request from leader 2. Leader reads from disk 3. Leader distributes to peers File metadata operations: # of tests = # of locations File metadata operations: # of tests = # of locations File read operations: # of reads = # of libraries File read operations: # of reads = # of libraries SPINDLE: Scalable Parallel Input Network for Dynamic Load Environments FILE file File Req DIR dir dir req dir

Emerging Technologies in HPC Software Development Workshop 6 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Overview  Design & Implementation:  Components of SPINDLE  Interaction with the dynamic linker (client-side)  Strategies for caching and data distribution (Pull- or Push-Model)  Communication between SPINDLE server (overlay network)  Performance  Memory Usage  Usage and features of SPINDLE

Emerging Technologies in HPC Software Development Workshop 7 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle dynamic linker SPINDLE Components  Transparent user-space solution  SPINDLE Client  SPINDLE Server  Overlay Network  Transparent user-space solution  SPINDLE Client  SPINDLE Server  Overlay Network dynamic linker … … … SPINDLE Client SPINDLE Server

Emerging Technologies in HPC Software Development Workshop 8 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Network Topology  Tree topology  Root node is responsible for file system I/O  Limits number of message hops: log(n) (n = # spindle server)  Overlay network  Needed due to absence of system communication layer (MPI, …) at startup time  Using COBO, a scalable communication infrastructure built for MPI SPINDLE Client SPINDLE Server SPINDLE Client

Emerging Technologies in HPC Software Development Workshop 9 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle SPINDLE Communication Models  SPINDLE supports Push and Pull models MPMD model  Different load sequences on processes  Pull model send data on request  Often sets of SPMD groups SPMD model  Similar load sequence on all processes  Push model broadcasts data immediately after first request

Emerging Technologies in HPC Software Development Workshop 10 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle SPINDLE Client Intercepts Dynamic Linker Transparently  Client Hook: rtld-audit interface of GNU linker  Redirect library loads to new files  Redirect symbol name bindings  Used for Python interception  Server communicates to clients through IPC  Stores libraries in Ramdisk

Emerging Technologies in HPC Software Development Workshop 11 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle RTLD_AUDIT Interface provides hooks into Dynamic Linker  Specify dynamic library with audit interface in LD_AUDIT environment variable.  man ld-audit  Can intercept:  Dynamic Library searches/opens/closes  Symbol bindings  Inter-library Calls  Runs in separate library namespace

Emerging Technologies in HPC Software Development Workshop 12 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle SPINDLE’s Performance

Emerging Technologies in HPC Software Development Workshop 13 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Constant Overhead of SPINDLE’s Data Distribution

Emerging Technologies in HPC Software Development Workshop 14 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle SPINDLE’s Memory Footprint  SPINDLE loads libraries to node-local RAM disk  All pages are in memory  No page reload during run-time (less OS-noise) used pages unused pages (in RAM disk) unused pages (on remote FS) SPINDLE overhead: < 15 MB SPINDLE with without

Emerging Technologies in HPC Software Development Workshop 15 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Launching SPINDLE  SPINDLE wrapper call:  Executable is not modified  SPINDLE scalably loads:  Library files (from dependencies and dlopen)  Executable  Python.py/.pyc/.pyo files  exec/execv/execve/… call targets  Can follow forked processes  Integrated with LaunchMON % spindle srun -n 512 myapp.exe

Emerging Technologies in HPC Software Development Workshop 16 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Conclusion  Loading of dynamic applications at large scale  Limited scalability on large HPC systems  File access storm  denial-of-service attack to file system  SPINDLE  Extends dynamic loading by intercepting the dynamic loader through the auditing interface  Implements overlay network of file-cache servers for sharing location info, libraries, and Python files  Provides scalable and transparent environment for loading dynamic application  Evaluation of SPINDLE shows loading with no disruption for  Pynamic benchmark on Sierra up to 15,312 MPI tasks with constant overhead for SPINDLE data distribution

Emerging Technologies in HPC Software Development Workshop 17 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Availability & Outlook  Availability of SPINDLE  GitHub:  Documentation:  Licence: GNU Lesser General Public License (LGPL)  Build: Configure and Libtool, Test Environment  Version: 0.8  SPINDLE’s next steps  Porting, optimization and customization for broader range of HPC systems (e.g. IBM Blue Gene)  Tighter integration into various HPC system software (resource manager software systems) SPINDLE’s is a stepping stone on the way to a massively parallel OS/runtime loading service for future exascale systems

Emerging Technologies in HPC Software Development Workshop 18 Lawrence Livermore National Laboratory Scalable Library Loading with Spindle Questions? Matthew LeGendre