Efficient software checkpointing framework for speculative techniques

Efficient software checkpointing framework for speculative techniques
ECE Connections 2006 Co-Supervisors: Prof. Greg. Steffan Prof. Cristiana Amza Chuck (Chengyan) Zhao Department of Computer Science University of Toronto Jun. 09, 2006 Might need to introduce my supervisors to the audience

Chip Multi-Processor (CMP) is now everywhere
IBM: Power 4 Power 5 Intel: Montecito Smithfield AMD: dual-core Opteron, Athlon X2 Four-core Opteron Sun: UltraSparc T1: 32 cores UltraSparc T2: 64 cores Sony, Toshiba, IBM: Cell:9 cores … … Power 4 Dual-core Intel chip We are interested in improving the performance of a single application, using the abundant CMP resources (which most of would stay idle most of the time) Dual-core Opteron Cell use CMP for single-threaded applications through parallelization

Parallelization Techniques
Automatic Parallelization conservative + precise: prove of non dependence limited domain Speculative Parallelization non-conservative has to recover from failures focus: speculative parallelization, use TLS

Thread-Level Speculation (TLS) Parallelism
Code example for ( …){ … *p = …; … = … *q; } difficult to parallelize automatically uncertain dependence between *p and *q might be runtime or user-input dependent Points at slide while talking. 2. turn each loop iteration into a thread 3. checkpointing scheme + dependence testing

How Thread-Level Speculation works
TLS …*q *p…   violation  Recover  …*q Exec. Time We take a sequential program and carve it into threads. Watch for violations We then execute the threads speculatively in parallel. The speculative part is that we don’t know whether these threads are actually independent. Instead, we depend on runtime support to tell us whether the threads actually were independent whenever we have violated a data dependence we simply re-execute that thread so that it is redone with the proper value, otherwise we can commit the speculative work. But even when speculation has failed, we can still reduce overall execution time by exploiting the available parallelism. If you are usually right, then it is faster to apologize when you are wrong than to always ask for permission exploit available thread-level parallelism

Memory Checkpointing Compiler Transformations
mark region of interest backup each memory write (store) generate buffer refresh calls generate recovery code remove region marking delimiters start_instrument(); setjmp(buf1); for(…){ refresh_ckpt(); backup_mem(a); a = …; backup_mem(b); b = …; … } if(error_spec()){ ckp_restore(); longjmp(buf1); } stop_instrument(); mention those function calls are currently organized into a runtime library

Preliminary Results: MCF in SPEC2KINT
index fname 1 refresh_potential() 2 bea_compute_red_cost() 3 primal_bea_mpp() 4 1 + 2 5 1 + 3 6 2 + 3 7 Picked SPEC2000 CPU INT Benchmark suite (make 10 / 12 applications available) Remember to show the key point: performance degradation is can be up to 50%, but have large room of improvements

Challenges and Future Work
Challenges: software overhead Proposed Solutions: optimizations inlining optimal buffer sizing and refreshing placement memory optimizations Applications value prediction debugging support reliability enhancement TLS (long term) ... Mention that the challenge of software-only checkpointing is to significantly reducing the software overhead by aggressively optimizations

Questions and Answers

Efficient software checkpointing framework for speculative techniques

Similar presentations

Presentation on theme: "Efficient software checkpointing framework for speculative techniques"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient software checkpointing framework for speculative techniques

Similar presentations

Presentation on theme: "Efficient software checkpointing framework for speculative techniques"— Presentation transcript:

Similar presentations

About project

Feedback