Presentation is loading. Please wait.

Presentation is loading. Please wait.

1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University.

Similar presentations


Presentation on theme: "1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University."— Presentation transcript:

1 1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University of Illinois at Urbana-Champaign 14/28/2010

2 Charm++ Workshop 2010 Outline Overview BigSim Emulator BigSim Simulator 24/28/2010

3 Summarizing the State of Art Petascale Very powerful parallel machines exist (Jaguar, Roadrunner, etc) Application domains exist that need that kind of power New generation of applications Use sophisticated algorithms Dynamic adaptive refinements Multi-scale, multi-physics Parallel applications are more complex than sequential ones, hard to predict without actually running it Challenge: Is it possible to simulate these applications on large scale using small clusters? 3 Charm++ Workshop 2010 4/28/2010

4 BigSim Why BigSim, and why on Charm++? Targets large scale simulation Object-based processor virtualization For a virtualized execution environment Efficient message passing runtime by Charm++ Support fine-grained decomposition Portability 4 Charm++ Workshop 2010 4/28/2010

5 5 BigSim Infrastructure Emulator A virtualized execution environment Charm++ and MPI applications No or small changes to MPI application source codes. facilitate code development and debugging Simulator Trace-driven approach Parallel Discrete Event Simulation Simple latency, full network contention modeling Predict parallel performance at varying levels of resolution Charm++ Workshop 201054/28/2010

6 6Charm++ Workshop 2010 Charm++/MPI applications Simulation trace logs BigSim Simulator Performance visualization (Projections)‏ BigSim Emulator AMPI Runtime Architecture of BigSim 6 Charm++ Runtime 4/28/2010 POSE

7 7 MPI Alltoall Timeline Charm++ Workshop 20104/28/2010

8 8 BigSim Emulator Emulate full machine on existing machines Actually run a parallel program E.g. NAMD on 256K target processors using 8K cores of Ranger cluster Implemented on Charm++ Libraries that link to user application Simple architecture abstraction Many multiprocessor (SMP) nodes connected via message passing Do not emulate at instruction level Charm++ Workshop 201084/28/2010

9 Processor-level queues Communication processors Worker processors Node-level queue Converse scheduler Converse Queue Processor-level queues Communication processors Incoming queue Worker processors Node-level queue Physical Processor Target Node 9 Incoming queue Target Node BigSim Emulator: functional view 9Charm++ Workshop 20104/28/2010

10 Processor Virtualization User ViewSystem View Programmer: Decomposes the computation into objects Runtime: Maps the computation on to the processors 10Charm++ Workshop 20104/28/2010

11 Major Challenges Running multiple copies of code on each processor Shared global variables Charm++ applications already handle this AMPI Global/static variables Runtime techniques, compiler tools E.g. NAMD on 1024 target processors using 8 cores Simulation time Memory footprint Global read-only variables can be shared Out-of-core execution Charm++ Workshop 2010114/28/2010

12 NAMD Emulation Charm++ Workshop 201012 Only 19 times of slowdownOnly 7 times of increase in mem 4/28/2010

13 13Charm++ Workshop 2010 Out-of-core Emulation Motivation Applications with large memory footprint VM system can not handle well Use hard drive Similar to checkpointing Message driven execution Peek msg queue => what execute next? (prefetch)‏ 134/28/2010

14 14Charm++ Workshop 2010 What is in the Trace Logs? Traces for 2 target processors Each SEB has: startTime, endTime Incoming Message ID Outgoing messages Dependences 14 Tools for reading bgTrace binary files: 1.charm/example/bigsim/tools/loadlog Convert to human-readable format 2.charm/example/bigsim/tools/log2proj Convert to trace projections log files 4/28/2010

15 BigSim Simulator: BigNetSim Post-mortem network simulator built on POSE (Parallel Object-oriented Simulation Environment), which is built on Charm++ Parallel Discrete Event Simulation Pass emulator traces through different network models in BigNetSim to get final performance results Details of using BigNetSim: http://charm.cs.uiuc.edu/workshops/charmWorks hop2009/slides/tut_BigSim09.ppt http://charm.cs.uiuc.edu/manuals/html/bignetsim/ manual.html 4/28/2010Charm++ Workshop 201015

16 POSE Network layer constructs (NIC, Switch, Node, etc.) implemented as poser simulation objects Network data constructs (message, packet, etc.) implemented as event methods on simulation objects 4/28/2010Charm++ Workshop 201016

17 Posers 4/28/2010Charm++ Workshop 201017 Each poser is a tiny simulation

18 Performance Prediction Two components: Time to execute blocks of sequential, computational code SEBs = Sequential Execution Blocks Communication time based on a particular network topology 4/28/2010Charm++ Workshop 201018

19 Sequential Time Prediction (Emulator) Manual Advance processor time using BgElapse() calls in application code Wallclock time Use multiplier (scale factor) to account for architecture differences Performance counters Count instructions with hardware counters Use expected time of each instruction on target machine to derive execution time Instruction-level simulation (e.g., Mambo) Record cycle-accurate execution times for functions Use interpolation tool to replace SEB times 4/28/2010Charm++ Workshop 201019

20 Sequential Time Prediction (continued) Model-based (recent work) Performed after emulation Determine application functions responsible for most of the computation time Run these functions on target machine Obtain run times based on function parameters to create model Feed emulation traces through offline modeling tool (like interpolation tool) to replace SEB times Generates corrected set of traces 4/28/2010Charm++ Workshop 201020

21 Communication Time Prediction (Simulator) Valid for a particular network topology Generic: Simple Latency model Formula predicts time using latency and bandwidth parameters Specific BlueGene, Blue Waters, and others Latency-only option – uses formula specific to network Full contention 4/28/2010Charm++ Workshop 201021

22 Specific Model (Full Network) 4/28/2010Charm++ Workshop 201022 BGnode BGproc Net Interface Switch Transceiver Channel

23 Generic Model (Simple Latency) 4/28/2010Charm++ Workshop 201023 BGnode BGproc Net Interface Switch Transceiver Channel

24 What We Model Processors Nodes NICs Switches/hubs Channels Packet-level direct and indirect routing Buffers with credit scheme Virtual channels 4/28/2010Charm++ Workshop 201024

25 Other BigNetSim Features Skip points Set skip points in application code (e.g., after startup) Simulate only between skip points Transceiver Traffic pattern generator – replaces nodes and processors Windowing Set file window size to decrease memory footprint Can cut footprint in half or better, depending on trace structure Checkpoint-to-disk (recent work) Saves simulator state based on time or GVT interval for restart if crash occurs 4/28/2010Charm++ Workshop 201025

26 BigNetSim Tools Located in BigNetSim/trunk/tools Log Analyzer Provides info about a set of traces Number of events / simulated processor Number of messages sent Log Transformation (recently completed) Produces new set of traces with remapped objects Useful for testing load-balancing scenarios 4/28/2010Charm++ Workshop 201026

27 BigNetSim Output BgPrintf() statements Added to application code “%f” converted to committed time during simulation GVT = Global Virtual Time Each GVT tick = 1/factor seconds factor is defined in BigNetSim/trunk/Main/TCsim.h Link utilization statistics Projections traces Use -tproj command-line parameter 4/28/2010Charm++ Workshop 201027

28 BigNetSim Output Example Charm++: standalone mode (not using charmrun) Charm warning> Randomization of stack pointer is turned on in Kernel, run 'echo 0 > /proc/sys/kernel/randomize_va_space' as root to disable it. Thread migration may not work! Charm++> cpu topology info is being gathered! Charm++> 1 unique compute nodes detected! bgtrace: totalBGProcs=8 X=8 Y=1 Z=1 #Cth=1 #Wth=1 #Pes=1 Opts: netsim on: 0 Initializing POSE... POSE initialization complete. Using Inactivity Detection for termination. netsim skip_on 0 0 Info> timing factor 1.000000e+08... Info> invoking startup task from proc 0... [0:RECV_RESUME] Start of major loop at 0.014741 [0:RECV_RESUME] End of major loop at 0.034914 Simulation inactive at time: 38129444 Final GVT = 38129444 Final link stats [Node 0, Channel 0, ### Link]: ovt: 38129444, utilization time: 29685846, utilization %: 77.855439, packets sent: 472210 gvt=38129444 Final link stats [Node 0, Channel 3, ### Link]: ovt: 38129444, utilization time: 631019, utilization %: 0.016549, packets sent: 4259 gvt=38129444 1 PE Simulation finished at 18.052671. Program finished. 4/28/2010Charm++ Workshop 201028

29 29 Ring Projections Timeline Charm++ Workshop 20104/28/2010

30 BigNetSim Performance Examples of sequential simulator performance on Blue Print 4k-VP MILC Startup time: 0.7 hours Execution time: 5.6 hours Total run time: 6.3 hours Memory footprint: ~3.1 GB 256k-VP 3D Jacobi (10x10x10 grid, 3 iterations) Startup time: 0.5 hours Execution time: 1.5 hours Total run time: 2.0 hours Memory footprint: ~20 GB Still tuning parallel simulator performance 4/28/2010Charm++ Workshop 201030

31 Thank you! Free download of Charm++ and BigSim: http://charm.cs.uiuc.edu Send questions and comments to: ppl@charm.cs.uiuc.edu 4/28/2010Charm++ Workshop 201031


Download ppt "1Charm++ Workshop 2010 The BigSim Parallel Simulation System Gengbin Zheng, Ryan Mokos Charm++ Workshop 2010 Parallel Programming Laboratory University."

Similar presentations


Ads by Google