Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.

Similar presentations


Presentation on theme: "Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University."— Presentation transcript:

1 Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University

2 Contents Background Design Implementation Optimization Experiment Conclusion

3 Transactional Memory Background Trend to concurrent programming Current solution: – Lock – Flaws: Association between locks and data Deadlock Not composable

4 Transactional Memory Background a.credit(amount); b.debit(amount); class Account{ int balance; lock mylock; bool credit(int amount); bool debit(int amount); }; bool credit(int amount){ acquire(mylock); balance+=amount; release(mylock); } bool debit(int amount){ acquire(mylock); balance-=amount; release(mylock); } inconsistent state acquire(a.mylock); acquire(b.mylock); release(a.mylock); release(b.mylock); Poor abstraction of class Account Deadlock Exposed implementation details transfer(Account a, Account b, int amount){ } atomic{ a.credit(amount); b.debit(amount); }

5 Transactional Memory Background Current Implementations – TM libraries DSTM DracoSTM TL2 TinySTM …….. Function calls: TM_INIT()/TM_SHUTDOWN() TM_ATOMIC_BEGIN()/TM_ATOMIC_END() TM_SHARED_READ()/TM_SHARED_WRITE() Explicit Transaction

6 Transactional Memory Background Current Implementations – Compilers Intel C++ STM Compiler Tanger OpenTM GCC

7 Design Programming Interfaces #pragma tm atomic [clause] structured block readonly private(var list) shared(var list) #pragma tm abort #pragma tm function function declaration #pragma tm waiver function declaration

8 Design TM runtime interfaces (TL2) InterfaceDescription Thread* TxNewThread()Allocate a new Thread structure to keep logs TxStart(Thread* Self, jmp_buf* buf, int flags)Start a new transaction for current thread TxCommit(Thread* Self)Commit the current transaction TxLoad(Thread* Self, void* addr)Perform synchronized load from given memory address TxStore(Thread* Self, void* addr, intptr_t val)Perform synchronized store to given memory address TxStoreLocal(Thread* Self, void* addr, intptr_t val)Perform locally logged store to given memory address TxAbort(Thread* Self)Abort the current transaction and re-execute

9 Design Wrapper functions – To ease the process of integrating new TM libraries tm_init()/tm_finalize() tm_thread_start()/tm_thread_end() __tm_atomic_begin()/__tm_atomic_end() __tm_shared_read()/__tm_shared_read_float() __tm_shared_write()/__tm_shared_write_float() __tm_local_write()/__tm_local_write_float() by programmers by compiler more wrapper functions are needed for other data types, and additional TM semantics

10 Design Optimization – Eliminate redundant calls to runtime libraries

11 Implementation General Transformation

12 Implementation General Transformation – #pragma tm atomic – simple statements – control flow statements IF WHILE_DO a = b+c; PARM #address of c CALL LDID STID #tm_preg_num_0 PARM #address of b CALL LDID STID #tm_preg_num_1 LDID #tm_preg_num_0 LDID #tm_preg_num_1 ADD PARM PARM #address of a CALL setjmp(); __tm_atomic_begin(); for(;i<10;i++){ } PARM #address of I CALL LDID STID #tm_preg_num_0 WHILE_DO LDID #tm_preg_num_0 INTCONST 9 LE BODY BLOCK ……………. PARM #address of I CALL LDID STID #tm_preg_num_0 END_BLOCK

13 Implementation General Transformation 1.1 int i = 0; 1.2 #pragma tm atomic { 1.3 int j = 0; 1.4 for(i=0;i<20;i++) { 1.5 for(j=0;j<10;j++) { 1.6 result++; } 2.1 int i = 0; 2.2 jmpbuf jbuf; 2.3 _setjmp(jbuf); 2.4 TxStart(Self, jbuf); 2.5 TxStore(Self, &j, 0); 2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20; TxStore(Self, &i, TxLoad(Self, &i)+1)){ 2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10; TxStore(Self, &j, TxLoad(Self, &j)+1)){ 2.8 TxStore(Self, &result, TxLoad(Self, &result)+1); }} 2.9 TxCommit(Self);

14 Implementation Functions – clone and instrument #pragma tm function void calculate(){} void calculate() __tm_cloned__calculate() //instrumented #pragma tm atomic { calculate(); } #pragma tm atomic { __tm_cloned__calculate(); }

15 Implementation Optimization 1.1 int i = 0; 1.2 #pragma tm atomic { 1.3 int j = 0; 1.4 for(i=0;i<20;i++) { 1.5 for(j=0;j<10;j++) { 1.6 result++; } 2.1 int i = 0; 2.2 jmpbuf jbuf; 2.3 _setjmp(jbuf); 2.4 TxStart(Self, jbuf); 2.5 TxStore(Self, &j, 0); 2.6 for (TxStore(Self, &i, 0);; TxLoad(Self, &i)<20; TxStore(Self, &i, TxLoad(Self, &i)+1)){ 2.7 for(TxStore(Self, &j, 0); TxLoad(Self, &j)<10; TxStore(Self, &j, TxLoad(Self, &j)+1)){ 2.8 TxStore(Self, &result, TxLoad(Self, &result)+1); }} 2.9 TxCommit(Self); Transaction local variables : detected by the frontend

16 Implementation Optimization 1.1 int i = 0; 1.2 #pragma tm atomic { 1.3 int j = 0; 1.4 for(i=0;i<20;i++) { 1.5 for(j=0;j<10;j++) { 1.6 result++; } 2.1 int i = 0; 2.2 jmpbuf jbuf; 2.3 _setjmp(jbuf); 2.4 TxStart(Self, jbuf); 2.5 j=0; 2.6 for (TxStore(Self, &i, 0); TxLoad(Self, &i)<20; TxStore(Self, &i, TxLoad(Self, &i)+1)){ 2.7 for(j=0; j<10;j++)){ 2.8 TxStore(Self, &result, TxLoad(Self, &result)+1); }} 2.9 TxCommit(Self); Barrier Free variables : detected according to its storage class

17 Implementation Optimization 1.1 int i = 0; 1.2 #pragma tm atomic { 1.3 int j = 0; 1.4 for(i=0;i<20;i++) { 1.5 for(j=0;j<10;j++) { 1.6 result++; } 2.1 int i = 0; 2.2 jmpbuf jbuf; 2.3 _setjmp(jbuf); 2.4 TxStart(Self, jbuf); 2.5 j=0; 2.6 for (; i<20; TxStoreLocal(Self, &i, i+1)){ 2.7 for(j=0; j<10;j++)){ 2.8 TxStore(Self, &result, TxLoad(Self, &result)+1); }} 2.9 TxCommit(Self);

18 Implementation Optimization – Optimization opportunities detection strategy Pthread parallel task – transaction local: declared in tm atomic scope – barrier free: auto variables Cloned transactional function – transaction local: declared in the function OpenMP parallel task – transaction local: declared in tm atomic scope – barrier free: declared in micro task, marked in openmp private clause Checking readonly transactions – Limitation Reserved design for pointers Needs programmers to participate in optimization

19 Preliminary Experiments Compare with fine-grained lock based application

20 Preliminary Experiments Compare with manually instrumented application

21 Preliminary Experiments #pragma tm atomic { int j; *new_centers_len[index] ++; for(j=0;j<nfeatures;j++){ new_centers[index][j]+=feature[i][j]; } private(feature)

22 Conclusion & Future work A infrastructure for TM on Open64 – Replaceable TM implementation – Optimization More experiments on non-trivial applications are desired Nested transaction Signal processing Event handler Indirect calls Dealing with legacy code … FastDB: 8 out of 75 critical regions contain nested transactions FastDB: 28 out of 75 critical regions contain signal processing PARSEC: 20 out of 55 critical regions contain signal processing

23 Thanks


Download ppt "Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University."

Similar presentations


Ads by Google