Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g201103010) Coe-502 paper presentation 2.

Similar presentations


Presentation on theme: "Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g201103010) Coe-502 paper presentation 2."— Presentation transcript:

1

2 Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g201103010) Coe-502 paper presentation 2

3 OUtline 1.Introduction 2.Current Hardware 3.TCC in Hardware 4.TCC in Software 5.Performance evaluation 6.Conclusion.

4 Transactional Coherence and Consistency (TCC) provides a lock free transactional model which simplifies parallel hardware and software. Transactions are the basic unit of parallel work which are defined by the programmer. Memory coherence, communication and memory consistency are implicit in a transaction. Intoduction

5 Provide illusion of a single shared memory to all processors. Problem is divided into various parallel tasks that work on a shared data present in shared memory. Complex cache coherence protocols required. Memory consistency models are also required to ensure the correctness of the program. Locks used to prevent data races and provide sequential access. Too many locks overhead can degrade performance. Current Hardware

6 TCC in HARDWARE Processors execute speculative transactions in a continuous cycle. A transaction is a sequence of instructions marked by software that are guaranteed to execute and complete atomically. Provides All Transactions All The time model which simplifies parallel hardware and software.

7 TCC in HARDWARE When a transaction starts, it produces a block of writes in a local buffer while transaction is executing. After completing transaction, hardware arbitrates system wide for permission to commit the transaction. After acquiring permission, the node broadcasts the writes of the transaction as one single packet. Transmission as a single packet reduces number of inter processor messages and arbitrations. Other processors snoop on these write packets for dependence violation.

8 TCC in HARDWARE

9 TCC simplifies cache design Processor hold data in unmodified and speculatively modified form. During snooping invalidation is done if commit packet contains address only. Update is done if commit packet contains address and data. Protection against data dependencies. If a processor has read from any of the commit packet address, the transaction is re executed.

10 TCC in HARDWARE Current CMP need features that provide speculative buffering of memory references and commit arbitration control. Mechanism for gathering all modified cache lines from each transaction into a single packet is required. Write Buffer completely separate from cache. Address buffer containing list of tags for lines containing data to be committed.

11 TCC in HARDWARE Read BITs Set on a speculative read during a transaction. Current transaction is voilated and restarted if the snoop protocal sees a commit packet having address of a location whose read bit is set. Modified BITs During a transaction stores set this bit to 1. During violation lines having modified bit set to 1 are invalidated.

12 TCC in Software Programming with TCC is a 3 Step process. Divide program into transactions. Specify Transactions Order. Can be relaxed if not required. Tuning Performance TCC provide feedback where in program the violations occur frequently

13 Loop Based Parallelization Consider Histogram Calculation for 1000 integer percentage /* input */ int *in = load_data(); int i, buckets[101]; for (i = 0; i < 1000; i++) { buckets[data[i]]++; } /* output */ print_buckets(buckets);

14 Loop Based Parallelization Can be parallelized using. t_for (i = 0; i < 1000; i++) Each loop body becomes a separate transaction. When two parallel iterations try to update same histogram bucket, TCC hardware causes later transaction to violate, forcing the later transaction to re execute. A conventional Shared memory model would require locks to protect histogram bins. Can be further optimized using t_for_unordered()

15 Fork Based Parallelization t_fork() forces the parent transaction to commit and create two completely new transactions. One continues execution of remaining code Second start executing the function provided in parameters. E.g /* Initial setup */ int PC = INITIAL_PC; int opcode = i_fetch(PC); while (opcode ! = END_CODE){ t_fork(execute, &opcode, 1, 1, 1); increment_PC(opcode, &PC); opcode = i_fetch(PC);}

16 Explicit transaction commit ordering Provide partial ordering. Done by assigning two parameters to each transaction Sequence Number and Phase Number Transactions with same sequence number commit in an ordered way defined by programmer. Transactions with different sequence number are independent. Order for transactions having same sequence numbered is achieved through phase number. Transaction having Lowest Phase number is executed first.

17 Performance Evaluation

18 Maximize Parallelization. Create as many transactions as possible Minimize Violations. Keep transactions small to reduce amount of work lost on violation Minimize Transaction Overhead Not To small size of transaction Avoid Buffer Overflow Can result in excessive serialization

19 Performance Evaluation Base Case. Simple parallelization without any optimization. Unordered Finding loops that can be un orderd. Reduction Finding areas that exploit reduction operations Privatization Privatize the variables to each transaction that cause violations. Using t_commit() Break large transactions to small ones but execute on same processor. Reduces loss overhead due to violations and prevents buffer overflow. Loop Adjustments Using various loop adjustments optimizations provided by the compiler.

20 Performance Evaluation Privatization and t_commit Improve performance Inner Loops had too many violations Using outer loop_adjust improved result

21 Performance Evaluation CMP performance is close to Ideal TCC for small number of processors.

22 Conclusions Bandwidth limitation is still a problem for scaling TCC to more processors. No support for nested for loops. Dynamic optimization techniques still required to automate performance tuning on TCC


Download ppt "Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g201103010) Coe-502 paper presentation 2."

Similar presentations


Ads by Google