Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute.

Similar presentations


Presentation on theme: "Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute."— Presentation transcript:

1 Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute and State University

2  Motivation & Objectives  Background  Transactional Memory  Jikes RVM  Program Reconstruction  Architecture  Profiler, Builder & Runtime  Future Work

3  Why Multicores?  Difficult to make single-core clock frequencies even higher  Deeply pipelined circuits ▪ heat problems ▪ speed of light problems ▪ difficult design and verification ▪ large design teams necessary ▪ server farms need expensive air- conditioning

4  No fast CPUs any more, just more cores!  Trend is using multi-core & hyper-threading

5  At 2005, Sun Niagara (8 cores with HT run 32 HWT)  At 2010, Supermicro (48-core AMD Opteron).  Now, Sun make boxes with between hardware threads (16 HWT/core, 8 cores/CPU) !! What About Software!!! Are we ready for this HW ?!

6  Many applications are designed to use few threads  Legacy systems were designed to run at a single processor  Multi-threading programming is headache for developers (race situations, concurrent access, …)  HydraVM: Java Virtual Machine Prototype based on Jikes RVM and targets utilizing large number of cores through detecting automatically possible parallel portions of code

7  Transactional Memory  Jikes RVM (Adaptive Online Architecture)

8  Atomicity: An operation (or set of operations) appears to the rest of the system to occur instantaneously Example (Money Transfer): …… synchronized { from = from - amount to = to + amount } …… Example (Money Transfer): …… account1.lock() account2.lock() from = from - amount to = to + amount account1.unlock() account2.unlock() …… account1 account2 X Y ≈

9  Drawbacks  Deadlock  Livelock  Starvation  Priority Inversion  Non-composable  Cost of managing the lock  Non-scalable on multiprocessors AB X Y

10  Simplifies parallel programming by allowing a group of load and store instructions to execute in an atomic way using additional primitives Example (Money Transfer): …… START-TRANSACTION from = from - amount to = to + amount END-TRANSACTION …… …… Commit or Rollback & Retry account1 account2 X Y account1 y account2 y account1 x account2 x

11  Each transaction has ReadSet & WriteSet  Transactions conflict if have the same variable(s) at ReadSet / WriteSet  Conflict Resolution using Contention Manager that employs different policies (Aggressive, Polite, Back-Off, Random, …..)  Aborted code undo changes (if required) and retries again

12  Transactions may be nested (multiple levels)  Inner transaction share the ReadSet/WriteSet of parent  Inner transactions conflicts with each other and with other higher level transactions  Aborting parent transaction forces abort for children  Inner transactions changes are visible to parents once commit successfully, but hidden from outside world till commit of highest level

13  Hardware Transactional Memory Modifications in processors, cache and bus protocol ex; unbounded HTM, TCC, ….  Software Transactional Memory Software runtime library or the programming language support Minimal hardware support; CAS, LL/SC ex; RSTM, DSTM, ESTM,..  Hybrid Transactional Memory Exploits HTM support to achieve hardware performance for transactions that do not exceed the HTM’s limitations, and STM otherwise ex; LogTM, HyTM, …  Distributed Transactional Memory Extends transaction primitives to distributed environment (network of multiple machines) ex; HyFlow, DecentSTM, GenSTM, …

14  Mature modular open source Java virtual machine designed for research purposes. Unlike most other JVMs it is written in Java!  Adaptive Online System

15  We view program as a set of basic building blocks  Each block is a set of instructions  Block has single entry and multiple exists  Blocks may access the same memory (variables)  It is possible to reconstruct the program from these blocks by rearranging it differently with some changes to the control instructions.  It is even possible to assign each set of blocks to different thread

16 int counter = 0; for(int i=0; i 0.3) counter++; else counter--; 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return

17 public class Test{ public static void foo(){ int counter = 0; for(int i=0; i 0.3) counter++; else counter--; } public static void zoo(){ System.out.println("hi"); } public static void main(String[] args){ int i=6; if(i<10) foo(); else zoo(); } }

18

19  Split code into Basic Block  Inject loaded classes with additional instructions to monitor:  Program Flow (Which Basic Blocks are accessed and in what order?)  Memory accessed by each Basic Block  Which Basic Block is doing I/O ?

20 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 write J write C visit B1 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl read K write K visit B2 14: ifle 23 17: iinc 1, 1 read C write C visit B3 20: goto 26 23: iinc 1, -1 read C write C visit B4 26: iinc 2, 1 read J write J visit B5 29: iload_2 30: bipush 12 visit B6 read J 32: if_icmplt 7 35: return 0: iconst_0 1: istore_1 2: iconst_0 3: istore_2 4: goto 29 7: invokestatic #13; 10: ldc2_w #19; 13: dcmpl 14: ifle 23 17: iinc 1, 1 20: goto 26 23: iinc 1, -1 26: iinc 2, 1 29: iload_2 30: bipush 12 32: if_icmplt 7 35: return Example: int C = 0; for(int J=0; J 0.3) C++; else C--;

21  Recompile the Java class bytecode into machine-code  Replace and reload class definition at memory

22  Running the profiled code  Collecting flow & memory access information and store it at the knowledge repository

23  Analyze knowledge repository information and know:  Which Blocks can be grouped together  Which groups of blocks can be parallelized

24  Program can be represented as a string (each character is a basic block).  Example: for (Integer i = 0; i < DIMx; i++) { for (Integer j = 0; j < DIMx; j++) { for (Integer k = 0; k < DIMy; k++) { C[i][j] += A[i][k] * B[k][j]; } } } abjbhcfefghcfefghijbhcfefghcfefghijk ab(jb(hcfefg) 2 hi) 2 jk

25 ab(jb(hcfefg) 2 hi) 2 k  Externalize common blocks patterns as methods  Generated methods may be nested  Reconstruct the program as producer-consumer pattern  Collector ▪ Provides Executor with suitable blocks as Tasks to execute according to flow up-to time  Executor ▪ Allocates core threads ▪ Assign tasks to threads ▪ Requests Collector for more blocks based on program flow, after all threads complete

26  Problems  Threads may conflict when access the same variables  Threads may finish out of normal order  Collector may generate invalid tasks  Lets represents each Thread as Transaction  When two transactions conflicts abort one that has newer blocks relative to normal execution  Transaction will not commit unless its preceding one in timeline is finished  Transaction timeout if not reachable

27  Collects which transactions conflicts and commit rate  We can refine the constructed program  Builder re-organize generated blocks and recompile the code again

28  Complete the implementation of HydraVM  Profiling by monitoring memory instead of generating new instructions  Automatically uses of Java NIO to handle I/O operations and generate callbacks to process it  Using thread scheduling techniques instead of TM  Formal verification of reconstructed programs matches desired semantics

29


Download ppt "Mohamed. M. Saad Mohamed A. Mohamedin & Prof. Binoy Ravindran VT-MENA Program Electrical & Computer Engineering Department Virginia Polytechnic Institute."

Similar presentations


Ads by Google