Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen.

Similar presentations


Presentation on theme: "Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen."— Presentation transcript:

1 Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen

2 Agenda l SMT – proposed in research l Intel Hyper-threading l Methodology - Benchmarks and experiments l Experimental Results l Questions?

3 SMT in Research l Up to 8 contexts – 8 way SMT l ICOUNT 2.8 fetching policy

4 Intel: Hyper-threading l SMT in real silicon – Intel Pentium 4 - Single vs. multithreaded mode

5 Methodology l Pentium 4 2.5 GHz 512 DRAM l RedHat 7.3 Linux 2.4.28smp - Linux treats the system as a dual-processor - It has a separate run queue for each virtual processor l Benchmarks - SPEC CPU2000 - NAS parallel benchmarks - SPLASH2 (modified input)

6 Speedup for Heterogeneous Workloads T SMT = total_execution_time / number of runs Speedup = T seq / T SMT Speedup per combination = S bench_1 + S bench_2 At least 12 total jobs At least 3 runs for each job

7 Static Partitioning of Resources SPECINT 83% on average SPECFP 85% on average eon 71% wupwise 72% mcf 93% art 97% swim 98%

8 Independent Threads

9 Parallel Multithreaded Speedup SPLASH NAS

10 Synchronization and Communication Speed l Reading a value protected by a lock - 37 million times per second - 68 cycles = lock & read l Updating a value protected by a lock - 14.6 million times per second - 171 cycles = lock & update

11 Synchronization and Communication Speed (cont’d) Loop result = independent computation computation that uses result – flow dependence Independent computation a loop that contains a load a float multiply a float add

12 Synchronization and Communication Speed (cont’d)

13 Heterogeneous vs. Homogeneous Workloads l Two self copies of SPEC - Average speedup 1.11 < 1.20 l Integer vs. integer 1.17 l Float vs. float 1.20 l Integer vs. float 1.21

14 Compiler Interaction Baseline?

15 Questions? l Is resource partitioning a good approach? l IBM’s Power5 implementation? l Other implementations?


Download ppt "Ioana Burcea Initial Observations of the Simultaneous Multithreading Pentium 4 Processor Nathan Tuck and Dean M. Tullsen."

Similar presentations


Ads by Google