Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop 2004 1 BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004.

Similar presentations


Presentation on theme: "Workshop 2004 1 BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004."— Presentation transcript:

1

2 PPL@cs.uiuc.eduCharm++ Workshop 2004 1 BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004

3 PPL@cs.uiuc.eduCharm++ Workshop 2004 2 Motivations ● Big machines are coming! – BG/L (128,000 processors) – ASCI Purple – Red Storm ● Can your application scale to 128,000 processors? – Not without a lot of wasted runtime on a peta scale machine – How much runtime can you get on hardware that isn't available?

4 PPL@cs.uiuc.eduCharm++ Workshop 2004 3 Approach ● Processor simulation – Coarse grained emulation – Fine grained instruction simulation ● Network Simulation – Coarse grained latency simulation – Fine grained transport layer ● Composition – Online: run it all at once – Offline: break the simulation up into levels

5 PPL@cs.uiuc.eduCharm++ Workshop 2004 4 The Medium is the Message ● Sequential performance is not the key to scalability ● Problem decomposition – Load balancing – Timing of result phases ● Communication – Timing and speed – network contention

6 PPL@cs.uiuc.eduCharm++ Workshop 2004 5 Life is Short ● Detail, Speed, Generality: Choose Two. – The more accuracy you want, the longer it will take to run and more architecture specific it must be ● We picked speed and generality – Coarse grained processor emulation – Coarse grained communication latency model ● We want it all – Let the user add detail during Post-Mortem analysis

7 PPL@cs.uiuc.eduCharm++ Workshop 2004 6 Paths Not Taken ● Instruction level simulation – architecture specific complexity ● pipelines, branch prediction, multiple instructions per cycle, compiler optimizations, etc. – detailed instruction simulators are heavyweight sequential applications – this level of accuracy is not vital to parallel performance optimization of scientific applications – for sequential performance measurement use sequential optimization techniques

8 PPL@cs.uiuc.eduCharm++ Workshop 2004 7 BigSim Features ● Choose network size and topology ● Configurable performance prediction methods ● Compile AMPI and Charm++/SDAG to run on emulator ● Supports standard Charm++ frameworks ● Projections tracing for performance analysis

9 PPL@cs.uiuc.eduCharm++ Workshop 2004 8 BigSim Architecture Charm++ and MPI applications Simulation output trace logs Performance visualization (Projections) BigSim Emulator Charm++ Runtime Online PDES engine Instruction Sim (RSim, IBM,..) Simple Network Model Performance counters Load Balancing Module BigNetSim (POSE) Network Simulator Offline PDES

10 PPL@cs.uiuc.eduCharm++ Workshop 2004 9 BigSim Emulator ● Emulate full machine on existing parallel machines – Actually run a parallel program with multi-million way parallelism ● Started with mimicking Blue Gene low level API ● Machine layer abstraction – Many multiprocessor (SMP) nodes connected via message passing

11 PPL@cs.uiuc.eduCharm++ Workshop 2004 10 Simulating (Host) Processor Simulated multi-processor nodes Simulated processor Emulation

12 PPL@cs.uiuc.eduCharm++ Workshop 2004 11 BigSim Emulator:Functional View Affinity message queues Communication processors Worker processors inBuf f Non-affinity message queues Correctio nQ Converse scheduler Converse Q Communication processors Worker processors inBuf f Non-affinity message queues Correctio nQ Affinity message queues Target Node

13 PPL@cs.uiuc.eduCharm++ Workshop 2004 12 Simulation ● Parallel Discrete Event Simulation – machine behaviors can be thought of as events beginning at a particular time and lasting for a set duration – direct execution or trace-driven ● Charm++ allows out of order messages ● Dependent events need to be executed in an order different from their arrival time ● Need time stamp correction based on dependency

14 PPL@cs.uiuc.eduCharm++ Workshop 2004 13 A Tale of Two Networks Direct Network Indirect Network

15 PPL@cs.uiuc.eduCharm++ Workshop 2004 14 Post-Mortem Network Simulation ● Run application on emulator and gather event trace logs – source – destinations – time stamp – event dependency – message size ● Replay on network simulator model – contention – topology – routing algorithms – packetization – collective communication

16 PPL@cs.uiuc.eduCharm++ Workshop 2004 15 POSE ● Parallel Object-oriented Simulation Environment – Charm++ ● Virtualization, load balancing, communication optimization, performance analysis – POSE Advantages ● Optimistic synchronization – maximize utilization with speculative execution ● adaptive strategies adjust to simulation behavior ● optimized for fine grained simulations ● good scalability

17 PPL@cs.uiuc.eduCharm++ Workshop 2004 16 POSE Design

18 PPL@cs.uiuc.eduCharm++ Workshop 2004 17 POSE Performance ● Tungsten 1->256 ● >13,000,000 events ● Wall clock – 8 seconds on 256 processors ● out of work? – 1775 secs sequential ● swapping heavily ● estimated at 325 secs Cheater!

19 PPL@cs.uiuc.eduCharm++ Workshop 2004 18 TCSim ● Time Stamp Correction Network Simulation ● Transform log into event messages ● Sends messages into network – BGnode – BGproc ● Capture results ● Terminate at set time or when we run out of messages

20 PPL@cs.uiuc.eduCharm++ Workshop 2004 19 HiSim Bluegene

21 PPL@cs.uiuc.eduCharm++ Workshop 2004 20 What If? ● What if Lemieux had 32000 processors? FEM on 125 to 32000 processors Run on 32 real Lemieux processors

22 PPL@cs.uiuc.eduCharm++ Workshop 2004 21 LeanMD ● Molecular dynamics simulation designed for large machines ● K-away cut-off parallelization Benchmark er-gre with 3-away 36573 atoms 1.6 million objects 8 step simulation 32k processor BG machine Running on 400 PSC Lemieux processors

23 PPL@cs.uiuc.eduCharm++ Workshop 2004 22 LeanMD on BigSim

24 PPL@cs.uiuc.eduCharm++ Workshop 2004 23 QsNet ● Indirect Network – Hierarchical – Node to Switch – Switch to Switch ● AKA Elan

25 PPL@cs.uiuc.eduCharm++ Workshop 2004 24 Network Performance Prediction Actual MeasuredSimulated K-Shift strategy performance under random load on 64 Lemieux processors

26 PPL@cs.uiuc.eduCharm++ Workshop 2004 25 Validation

27 PPL@cs.uiuc.eduCharm++ Workshop 2004 26 Future Work ● User events Projections event log in simulation time ● More validation to improve accuracy ● Hybrid Networks ● Approximation from performance counters ● Integration with instruction level simulation – use statistical sampling to make viable ● Sample network configuration files


Download ppt "Workshop 2004 1 BigSim Large Parallel Machine Simulation Presented by Eric Bohm PPL Charm Workshop 2004."

Similar presentations


Ads by Google