Presentation is loading. Please wait.

Presentation is loading. Please wait.

Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University.

Similar presentations


Presentation on theme: "Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University."— Presentation transcript:

1 Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University

2 P 1 $ $ P n Global Clock Transactional Memory Software transactional memory (STM) exploits a global clock to validate transactional data  Pros: reduces validation overhead  Cons: contention Alternate: Read Write Lock Allocation (RWLA)  Pros: no central clock  Cons: overhead if a TX aborts Speculative RWLA: changes validation policy dynamically → Speedup: up to 66% 2

3 Outline Background RWLA Speculative RWLA Conclusion 3

4 4 Counter in STM T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END();

5 Transactional data are validated using:  Global clock Shared variable Timestamp for transactions  Lock Memory is mapped to Lock Table Each entry of the table:  Version # … … 5 Validation in STM Global Clock Memory Lock Table Version #

6 6 Updating Global Clock & Lock Increment Global Clock Version # = global_clock Global Clock Memory Lock Table Version # … … counter

7 7 Validation in STM rv (read version) is set to global_clock T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 rv Global Clock

8 8 Successful Read Validation rv >= version#  The most recent write to counter, occurred before TM_BEGIN() T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 Global Clock rv

9 9 Failed Read Validation rv < version#  The most recent write to counter, occurred after TM_BEGIN() T1T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Metadata for TX 1 Global Clock rv

10 Overhead of Validation This method, called GV4, results in many cache coherence misses if transactions commit frequently 10 P 1 $ $ P n Global Clock

11 Outline Background RWLA Speculative RWLA Conclusion 11

12 Lock  Memory is mapped to Lock Table  Each entry of the table: Lock bit Read bits Read Write Lock Allocation (RWLA) 12 Lock Table … … Memory P0P0 P1P1 …P n-1 lock bit Read bits

13 13 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); …..

14 14 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Set read bit in the corresponding lock entry Yes TM_READ() Lock bit is free? ….. 1 lock bit

15 15 TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort No ….. Set read bit in the corresponding lock entry Yes TM_READ() Lock bit is free?

16 16 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort TM_WRITE All read bits are clear? No …..

17 17 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); Abort TM_WRITE Acquire lock failed All read bits are clear? No Yes …..

18 18 TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); 00000….. Abort TM_WRITE Acquire lock failed All read bits are clear? No Yes 1 0

19 Experimental Framework Benchmarks: Stamp v0.9.7  Run up to competition  Measured statistics over 10 runs TL2 as an STM framework Two Intel Xeon E5660, 6-way CMP 19

20 Performance of RWLA 20 better

21 Speculative RWLA Conflict occurs frequently → select GV4 Conflict occurs rarely → select RWLA How to predict conflict? 21

22 Contention Predictor Prediction :  y≥0 →predict commit  y<0 →predict abort Update  If outcome of current TX and TX i agree/disagree →increment/decrement w i 22 1 X1X1 … XnXn y w1w1 w0w0 wnwn x i : global transaction history, bipolar value w i : weight vector

23 Performance of Speculative RWLA # of threads changes between 2 and 16 On average, performance changes from 21% in Bayes to 47% in Labyrinth 23 better

24 Conclusion RWLA to overcome contentions over global clok Applications react differently to GV4 and RWLA Speculative RWLA changes validation policy dynamically Speculative RWLA performance of STMs up to 66% 24

25 25 Thank You! Questions?


Download ppt "Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University."

Similar presentations


Ads by Google