Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi

Similar presentations


Presentation on theme: "1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi"— Presentation transcript:

1 1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi {kchak,pwells,sohi}@cs.wisc.edu

2 Chakraborty, Wells, and Sohi ASPLOS 2006 2 Paper Overview  Multiprocessor Code Reuse Poor resource utilization  Computation Spreading New model for assigning computation within a program on CMP cores in H/W Case Study: OS and User computation  Investigate performance characteristics

3 Chakraborty, Wells, and Sohi ASPLOS 2006 3 Talk Outline  Motivation  Computation Spreading (CSP) Case study: OS and User compution  Implementation  Results  Related Work and Summary

4 Chakraborty, Wells, and Sohi ASPLOS 2006 4 Homogeneous CMP  Many existing systems are homogeneous Sun Niagara, IBM Power 5, Intel Xeon MP  Multithreaded server application Composed of server threads Typically each thread handles a client request OS assigns software threads to cores Entire computation from one thread execute on a single core (barring migration)

5 Chakraborty, Wells, and Sohi ASPLOS 2006 5 Code Reuse  Many client requests are similar Similar service across multiple threads Same code path traversed in multiple cores  Instruction footprint classification Exclusive – single core access Common – many cores access Universal – all cores access

6 Chakraborty, Wells, and Sohi ASPLOS 2006 6 Multiprocessor Code Reuse

7 Chakraborty, Wells, and Sohi ASPLOS 2006 7 Implications  Lack of instruction stream specialization Redundancy in predictive structures Poor capacity utilization Destructive interference  No synergy among multiple cores Lost opportunity for co-operation Exploit core proximity in CMP

8 Chakraborty, Wells, and Sohi ASPLOS 2006 8 Talk Outline  Motivation  Computation Spreading (CSP) Case study: OS and User compution  Implementation  Results  Related Work and Summary

9 Chakraborty, Wells, and Sohi ASPLOS 2006 9 Computation Spreading (CSP)  Computation fragment = dynamic instruction stream portion  Collocate similar computation fragments from multiple threads Enhance constructive interference  Distribute dissimilar computation fragments from a single thread Reduce destructive interference Reassignment is the key

10 Chakraborty, Wells, and Sohi ASPLOS 2006 10 Example A1B1C1A1B1C1 B2C2A2B2C2A2 C3A3B3C3A3B3 T1 T2 T3 B3B3 A3A3 C3C3 A1A1 C1C1 B1B1 B2B2 C2C2 A2A2 P1 P2 P3 CANONICAL CSP time A1B1C1A1B1C1 B2C2A2B2C2A2 C3A3B3C3A3B3

11 Chakraborty, Wells, and Sohi ASPLOS 2006 11 Key Aspects  Dynamic Specialization Homogeneous multicore acquires specialization via retaining mutually exclusive predictive state  Data Locality Data dependencies between different computation fragments Careful fragment selection to avoid loss of data locality

12 Chakraborty, Wells, and Sohi ASPLOS 2006 12 Selecting Fragments  Server workloads characteristics Large data and instruction footprint Significant OS computation  User Computation and OS Computation A natural separation Exclusive instruction footprints Relatively independent Relatively independent data footprint

13 Chakraborty, Wells, and Sohi ASPLOS 2006 13 Data Communication T1T1 T2T2 T 1 -User T 1 -OS T 2 -User T 2 -OS Core 1Core 2

14 Chakraborty, Wells, and Sohi ASPLOS 2006 14 Relative Inter-core Data Communication ApacheOLTP OS-User Communication is limited

15 Chakraborty, Wells, and Sohi ASPLOS 2006 15 Talk Outline  Motivation  Computation Spreading (CSP) Case study: OS and User compution  Implementation  Results  Related Work and Summary

16 Chakraborty, Wells, and Sohi ASPLOS 2006 16 Implementation  Migrating Computation Transfer state through the memory subsystem ~2KB of register state in SPARC V9 Memory state through coherence  Lightweight Virtual Machine Monitor Migrates computation as dictated by the CSP Policy Implemented in hardware/firmware

17 Chakraborty, Wells, and Sohi ASPLOS 2006 17 Baseline User Cores OS Cores User Comp OS Comp Virtual CPUs Physical Cores Software Stack Implementation cont Threads

18 Chakraborty, Wells, and Sohi ASPLOS 2006 18 User Cores OS Cores Virtual CPUs Physical Cores Software Stack Implementation cont Threads

19 Chakraborty, Wells, and Sohi ASPLOS 2006 19 CSP Policy  Policy dictates computation assignment  Thread Assignment Policy (TAP) Maintains affinity between VCPUs and physical cores  Syscall Assignment Policy (SAP) OS computation assigned based on system calls  TAP and SAP use identical assignment for user computation

20 Chakraborty, Wells, and Sohi ASPLOS 2006 20 Talk Outline  Motivation  Computation Spreading (CSP) Case study: OS and User compution  Implementation  Results  Related Work and Summary

21 Chakraborty, Wells, and Sohi ASPLOS 2006 21 Simulation Methodology  Virtutech SIMICS MAI running Solaris 9  CMP system: 8 out-of-order processors 2 wide, 8 stages, 128 entry ROB, 3GHz  3 level memory hierarchy Private L1 and L2 Directory base MOSI L3: Shared, Exclusive 8MB (16w) (75 cycle load-to-use) Point to point ordered interconnect (25 cycle latency) Main Memory 255 cycle load to use, 40GB/s Measure impact on predictive structures

22 Chakraborty, Wells, and Sohi ASPLOS 2006 22 L2 Instruction Reference

23 Chakraborty, Wells, and Sohi ASPLOS 2006 23 Result Summary  Branch predictors 9-25% reduction in mis-predictions  L2 data references 0-19% reduction in load misses Moderate increase in store misses  Interconnect messages Moderate reduction (after accounting extra messages for migration)

24 Chakraborty, Wells, and Sohi ASPLOS 2006 24 Performance Potential Migration Overhead

25 Chakraborty, Wells, and Sohi ASPLOS 2006 25 Talk Outline  Motivation  Computation Spreading (CSP) Case study: OS and User compution  Implementation  Results  Related Work and Summary

26 Chakraborty, Wells, and Sohi ASPLOS 2006 26 Related Work  Software re-design: staged execution Cohort Scheduling [Larus and Parkes 01], STEPS [Ailamaki 04], SEDA [Welsh 01], LARD [Pai 98] CSP: similar execution in hardware  OS and User Interference [several] Structural separation to avoid interference CSP avoids interference and exploits synergy

27 Chakraborty, Wells, and Sohi ASPLOS 2006 27 Summary  Extensive code reuse in CMPs 45-66% instruction blocks universally accessed in server workloads  Computation Spreading Localize similar computation and separate dissimilar computation Exploits core proximity in CMPs  Case Study: OS and User computation Demonstrate substantial performance potential

28 Chakraborty, Wells, and Sohi ASPLOS 2006 28 Thank You!

29 Chakraborty, Wells, and Sohi ASPLOS 2006 29 Backup Slides

30 Chakraborty, Wells, and Sohi ASPLOS 2006 30 L2 Data Reference L2 load miss comparable, slight to moderate increase in L2 store miss

31 Chakraborty, Wells, and Sohi ASPLOS 2006 31 Multiprocessor Code Reuse

32 Chakraborty, Wells, and Sohi ASPLOS 2006 32 Performance Potential


Download ppt "1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi"

Similar presentations


Ads by Google