Presentation is loading. Please wait.

Presentation is loading. Please wait.

BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University.

Similar presentations


Presentation on theme: "BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University."— Presentation transcript:

1 BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University of Illinois at Urbana-Champaign

2 IPDPS 4/29/2004 2 Motivations Extremely large parallel machines around the corner Examples: ASCI Purple (12K, 100TF) BlueGene/L (64K, 360TF) BlueGene/C (8M, 1PF) PF machines likely to have 100k+ processors (1M?) Would existing parallel applications scale? Machines are not there Parallel performance is hard to model without actually running the program

3 IPDPS 4/29/2004 3 BlueGene/L

4 IPDPS 4/29/2004 4 Roadmap Explore suitable programming models Charm++ (Message-driven) MPI and its extension - AMPI (adaptive version of MPI) Use a parallel emulator to run applications Coarse-grained simulator for performance prediction (not hardware simulation)

5 IPDPS 4/29/2004 5 Charm++ - Object-based programming model User View System implementation User is only concerned with interaction between objects

6 IPDPS 4/29/2004 6 Charm++ Object-based Programming Model Processor virtualization Divide computation into large number of pieces Independent of number of processors Typically larger than number of processors Let system map objects to processors Empowers an adaptive, intelligent runtime system User View System implementation

7 IPDPS 4/29/2004 7 Charm++ for Peta-scale Machines Explicit management of resources This data on that processor This work on that processor Object can migrate Automatic efficient resource management One sided communication Asynchronous global operations (reductions,..)

8 IPDPS 4/29/2004 8 AMPI - MPI + processor virtualization Implemented as virtual processors (user-level migratable threads) Real Processors 7 MPI “processes”

9 IPDPS 4/29/2004 9 Parallel Emulator Actually run a parallel program Emulate full machine on existing parallel machines Based on a common low level abstraction (API) Many multiprocessor nodes connected via message passing Emulator supports Charm++/AMPI Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant V. Kalé, ``A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops'' in NGS Program Workshop, IPDPS2002

10 IPDPS 4/29/2004 10 Emulation on a Parallel Machine Simulating (Host) Processor Simulated multi-processor nodes Simulated processor Emulating 8M threads on 96 ASCI-Red processors

11 IPDPS 4/29/2004 11 Emulator Performance Scalable Emulating a real-world MD application on a 200K processor BG machine Gengbin Zheng, Arun Singla, Joshua Unger, Laxmikant V. Kalé, ``A Parallel-Object Programming Model for PetaFLOPS Machines and Blue Gene/Cyclops'' in NGS Program Workshop, IPDPS02

12 IPDPS 4/29/2004 12 Emulator to Simulator Predicting parallel performance Modeling parallel performance accurately is challenging Communication subsystem Behavior of runtime system Size of the machine is big

13 IPDPS 4/29/2004 13 Performance Prediction Parallel Discrete Event Simulation (PDES) Logical processor (LP) has virtual clock Events are time-stamped State of an LP changes when an event arrives to it Our emulator was extended to carry out PDES

14 IPDPS 4/29/2004 14 Predict Parallel Components How to predict parallel components? Multiple resolution levels Sequential component: User supplied expression Performance counters Instruction level simulation Parallel component: Simple latency-based network model Contention-based network simulation

15 IPDPS 4/29/2004 15 Prior PDES Work Conservative vs. optimistic protocols Conservative: (example: DaSSF) Ensure safety of processing events in global fashion Typically require a look-ahead – high global synchronization overhead MPI-SIM Optimistic: (examples: Time Warp, SPEEDS) Each LP process the earliest event on its own, undo earlier out of order execution when causality errors occur Exploit parallelism of simulation better, and is preferred

16 IPDPS 4/29/2004 16 Why not use existing PDES? Major synchronization overheads Rollback/restart overhead Checkpointing overhead We can do better in simulation of some parallel applications Property of Inherent determinacy in parallel applications Most parallel programs are written to be deterministic, example “ Jacobi ”

17 IPDPS 4/29/2004 17 Timestamp Correction Messages should be executed in the order of their timestamps Causality error due to out-of-order message delivery Rollback and checkpoint are necessary in traditional methods Inherent determinacy is hidden in applications Need to capture event dependency Run-time detection Use language “structured dagger” to express dependency

18 IPDPS 4/29/2004 18 Simulation of Different Applications Linear-order applications No wildcard MPI receives Strong determinacy, no timestamp correction necessary Reactive applications (atomic) Message driven objects Methods execute as corresponding messages arrive Multi-dependent applications Irecvs with WaitAll (MPI) Uses of structured dagger to capture dependency (Charm++)

19 IPDPS 4/29/2004 19 Structured-Dagger entry void jacobiLifeCycle() { for (i=0; i<MAX_ITER; i++) { atomic {sendStripToLeftAndRight();} overlap { when getStripFromLeft(Msg *leftMsg) { atomic { copyStripFromLeft(leftMsg); } } when getStripFromRight(Msg *rightMsg) { atomic { copyStripFromRight(rightMsg); } } } atomic{ doWork(); /* Jacobi Relaxation */ } }

20 IPDPS 4/29/2004 20 Time Stamping messages LP Virtual Timer: curT Message sent: RecvT(msg) = curT+Latency Message scheduled: curT = max(curT, RecvT(msg))

21 IPDPS 4/29/2004 21 M1M7M6M5M4M3M2 RecvTime Execution TimeLine M8 Execution TimeLine M1M7M6M5M4M3M2M8 RecvTime Correction Message Timestamps Correction

22 IPDPS 4/29/2004 22 Charm++ and MPI applications Simulation output trace logs Performance visualization (Projections) BigSim Emulator Charm++ Runtime Online PDES engine Instruction Sim (RSim, IBM,..) Simple Network Model Performance counters Load Balancing Module Architecture of BigSim Simulator

23 IPDPS 4/29/2004 23 Charm++ and MPI applications Simulation output trace logs BigNetSim (POSE) Network Simulator Performance visualization (Projections) BigSim Emulator Charm++ Runtime Online PDES engine Instruction Sim (RSim, IBM,..) Simple Network Model Performance counters Load Balancing Module Offline PDES Architecture of BigSim Simulator

24 IPDPS 4/29/2004 24 Big Network Simulation Simulate network behavior: packetization, routing, contention, etc. Incorporate with post-mortem timestamp correction via POSE Switches are connected in torus network BGSIM Emulator POSE Timestamp Correction BG Log Files (tasks & dependencies) Timestamp-corrected Tasks BigNetSim

25 IPDPS 4/29/2004 25 BigSim Validation on Lemieux 32 real processors

26 IPDPS 4/29/2004 26 Jacobi on a 64K BG/L

27 IPDPS 4/29/2004 27 Case Study - LeanMD Molecular dynamics simulation designed for large machines K-away cut-off parallelization Benchmark er-gre with 3-away 36573 atoms 1.6 million objects 8 step simulation 32k processor BG machine Running on 400 PSC Lemieux processors Performance visualization tools

28 IPDPS 4/29/2004 28 Load Imbalance Histogram

29 IPDPS 4/29/2004 29 Performance of the BigSim Real processors (PSC Lemieux)

30 IPDPS 4/29/2004 30 Conclusions Improved the simulation efficiency by taking advantage of “inherent determinacy” of parallel applications Explored simulation techniques show good parallel scalability http://charm.cs.uiuc.edu

31 IPDPS 4/29/2004 31 Future Work Improving simulation accuracy Instruction level simulator Network simulator Developing run-time techniques (load balancing) for very large machines using the simulator


Download ppt "BigSim: A Parallel Simulator for Performance Prediction of Extremely Large Parallel Machines Gengbin Zheng Gunavardhan Kakulapati Laxmikant V. Kale University."

Similar presentations


Ads by Google