Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads.

Similar presentations


Presentation on theme: "SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads."— Presentation transcript:

1 SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads

2 Shared vs Private For SMT, Shared Data Cache and Private Instruction Cache provide the best performance[3] But SMT tend to share all the resources, comparing to CMP tend to duplicate resources.

3 Inter-Thread Cache Interference Parallel applications suffer from less inter- thread interference in the memory systems (3.4% performance loss), compare to multiprogrammed applications (way more than that)[1] –Parallel applications have similar execution resource requirements, memory reference patterns and level of ILP

4 Inter-Thread Cache Interference Symptoms If the PC of different threads point to the same bank, then bank conflict occurs, we can only fetch one thread from one bank at a time, which means we can not achieve the anticipated fetch bandwidth If there is a cache miss for one thread, while the cache line being replaced belong to another thread that going to be used in the near future, then we will undergo another cache miss again, which decrease the performance

5 Passive/Active Even though we have logic for cache logic detection (to compare different PCs), but it’s passive, we need some active mechanism to prevent from happening Traditional cache thread-distribution mechanism does not suitable for SMT, we need “thread-aware” distribution mechanism for L1 cache.

6 Approach Based on the simulation, first calculate the IPC of traditional cache structure (Seongwon’s simulator has the part that calculate the bank conflict) Then use the new method than can reduce the inter-thread cache interference, get the new performance IPC’ Get the ratio of IPC/IPC’

7

8 If we could get a result above, which means our mechanism works From another perspective, if we could not see significant increase in throughput, but we can apparently reduce the miss rate caused by inter-thread interference, it also works

9 How to solve it (still working on it) We can construct a mechanism to distribute different threads to different bank, which would solve the bank conflict issue between different threads –Static: Divide the cache banks evenly between two threads –Dynamic: divide cache banks among threads, put two thresholds, one to increase the quota of the fast thread, one to protect the quota of the slow thread

10 Reference Converting Thread-Level Parallelism Into Instruction- Level Parallelism via Simultaneous Multithreading Jack L. Lo, Susan J. Eggers, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, and Dean M. Tullsen, ACM Transactions on Computer Systems, August 1997Converting Thread-Level Parallelism Into Instruction- Level Parallelism via Simultaneous Multithreading Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, D.M. Tullsen, S.J. Eggers, J.S. Emer, H.M. Levy, J.L. Lo, and R.L. Stamm,In 23rd Annual International Symposium on Computer Architecture, May, 1996Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor Simultaneous Multithreading: Maximizing On-Chip Parallelism, D.M. Tullsen, S.J. Eggers, and H.M. Levy,In 22nd Annual International Symposium on Computer Architecture, June, 1995Simultaneous Multithreading: Maximizing On-Chip Parallelism


Download ppt "SMT Parallel Applications –For one program, parallel executing threads Multiprogrammed Applications –For multiple programs, independent threads."

Similar presentations


Ads by Google