Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.

Similar presentations


Presentation on theme: "1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science."— Presentation transcript:

1 1 Blue Gene Simulator Gengbin Zheng gzheng@uiuc.edu Gunavardhan Kakulapati kakulapa@uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu

2 2 Overview Blue Gene Emulator Blue Gene Simulator Timing correction schemes Performance and results

3 3 Emulation on a Parallel Machine Simulating (Host) Processor BG/C Nodes Hardware thread

4 4 Blue Gene Emulator: functional view Communication threads Non-affinity message queues Affinity message queues Worker threads inBuffer One Blue Gene/C node CorrectionQ

5 5 Blue Gene Emulator: functional view Affinity message queues Communication threads Worker threads inBuff Non-affinity message queues Correction Q Converse scheduler Converse Q Communication threads Worker threads inBuff Non-affinity message queues Correction Q Affinity message queues

6 6 What is capable … Blue Gene API support Blue Gene Charm++ –Structured Dagger Trace Projections

7 7 Emulator to Simulator Emulator: –Study programming model and application development Simulator: –performance prediction capability –models communication latency based on network model; –Doesn’t model memory access on chip, or network contention

8 8 Simulator Parallel performance is hard to model –Communication subsystem Out of order messages Communication/computation overlap –Event dependencies Parallel Discrete Event Simulation –Emulation program executes in parallel with event time stamp correction. –Exploit inherent determinacy of application

9 9 How to simulate? Time stamping events –Per thread timer (sharing one physical timer) –Time stamp messages Calculate communication latency based on network model Parallel event simulation –When a message is sent out, calculate the predicted arrival time for the destination bluegene-processor –When a message is received, update current time. currTime = max(currTime,recvTime) –Time stamp correction

10 10 Thread Timer: curT Time Stamping messages and threads Message sent: RecvT(msg) = curT+Latency Message scheduled: curT = max(curT, RecvT(msg))

11 11 Need for timestamp correction Time stamp correction needed for out-of- order messages Out-of-order delivery can occur: –A message arrives late while some other message updates the thread time to future –So late message executes in the context of future, although its predicted time is earlier

12 12 Parallel correction algorithm Sort message execution by receive time; Adjust time stamps when needed Use correction message to inform the change in event startTime. Send out correction messages following the path message was sent The events already in the timeline may have to move.

13 13 M8 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Timestamps Correction

14 14 M8 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Timestamps Correction

15 15 M1M7M6M5M4M3M2 RecvTime Execution TimeLine M8 Execution TimeLine M1M7M6M5M4M3M2M8 RecvTime Correction Message Timestamps Correction

16 16 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Correction Message (M4) M4 Correction Message (M4) M4 M1M7M4M3M2 RecvTime Execution TimeLine M5M6 Correction Message M1M7M6M4M3M2 RecvTime Execution TimeLine M5 Correction Message Timestamps Correction

17 17 Linear-order correction Works only when –Programs have no alternate orders of execution possible –Messages are processed in the same order for multiple executions –Eg: MPI programs with no-wildcard recvs, structured-dagger code with no “overlap” or “forall”.

18 18 Reasons: Correction algorithm breaks dependency logic –Only based on receive time; –Cases: When an event depends on several messages –Last message triggers the computation Message buffered until some condition holds Example for invalid correction scheme: Jacobi-1D

19 19

20 20 Solution Use structured dagger to retrieve dependence information As the program runs, form a chain of bluegene logs preserving the dependency information. Bluegene logs for entry functions and structured dagger functions

21 21 Timestamp correction scheme Every event has a list of backward and forward dependents. An event cannot start till its backward dependents have finished. Define effRecvTime = max(recvTime, endOfBackDeps) An event can start only after its effRecvTime. startTime = max(effRecvTime,timeline.last.endTime)

22 22 Timestamp correction scheme Timeline is not sorted on the recvTime of the event like the previous case. Timeline is sorted based on the effRecvTime. Steps to process a correction message –Find the earliest updated event due to the message –Cut the timeline from that event –Calculate new effRecvTimes from then. –Reinsert into the timeline in the order of effRecvTime

23 23 Non-linear order correction scheme The new scheme : –Takes into account the event dependencies –Works even when messages can be received in different orders in different runs. –Requires all the dependencies to be captured using structured dagger. But the timing correction is very slow. Several optimizations possible.

24 24 Optimizations to online correction scheme Overwrite old corrections: –An event can get multiple correction messages. –Reduce the number of corrections –Same scheme if correction message arrives earlier than the message itself Use multisend –Messages destined to same real processor but different events can be sent collectively.

25 25 More optimizations Prioritize messages based on their predicted recvTime. Lazy processing –Process correction messages periodically. –Allows corrections to be overwritten. Batch processing –Process many correction messages at a time –Many events will be affected –Choose the earliest and reinsert in the order of effRecvTime. Ability to start corrections in the middle –Can ignore the startup events for timing correction

26 26 Timing correction still very slow. Observations: –Don’t let the execution go far ahead of the correction wave. – A large difference means many wrong events to be corrected. –Closely following the execution wave also may not help. A new scheme –Similar to the one used for gvt (Global virtual time)

27 27 GVT-like scheme Use heartbeat –Periodically broadcast asking for gvt Gvt –Is the time after which the events are invalid due to pending corrections –Compute the gvt as the minimum of predict recvTimes of all correction messages and startTimes of all affected events. Use a parameter “leash”. Execution of the program cannot go beyond “gvt + leash”

28 28 Projections before correction

29 29 Projections after correction

30 30 Correctness of the scheme (using Jacobi1D)

31 31 Predicted time vs latency factor

32 32 Predicted speedup

33 33 More work Ongoing work –Make sure gvt scheme is correct Future work –The presented scheme is on-line correction –Explore the off-line (post-mortem) correction scheme using generated traces.


Download ppt "1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science."

Similar presentations


Ads by Google