Download presentation

Presentation is loading. Please wait.

Published byAspen Ground Modified over 3 years ago

1
Measuring and Modeling Hyper-threaded Processor Performance Ethan Bolker UMass-Boston September 17, 2003

2
Joint work with Yiping Ding, Arjun Kumar (BMC Software) Accepted for presentation at CMG32, December 2003 Paper (with references) available on request

3
Improving Processor Performance Speed up clock Invent revolutionary new architecture Replicate processors (parallel application) Remove bottlenecks (use idle ALU) –caches –pipelining –prefetch

4
Hyper-threading Technology (HTT) Default for new Intel high end chips One ALU Duplicate state of computation (registers) to create two logical processors (chip size *= 1.05) Parallel instruction preparation (decode) ALU should see ready work more often (provided there are two active threads)

5
The path to instruction execution Intel Technology Journal, Volume 06 Issue 01, February 14, 2002, p8

6
How little must we understand? Batch workload: repeated dispatch of identical compute intensive jobs –vary number of threads –measure throughput (jobs/second) Treat processor as a black box Experiment to observe behavior Model to predict behavior

7
Batch throughput } make sense } puzzling } makes sense

8
Transaction processing More interesting than batch Random size jobs arrive at random times M/M/1 M = “Markov” M/*/* : arrival stream is Poisson, rate */M/* : job size exponentially distributed, mean s */*/1 : single processor

9
M/M/1 model evaluation Utilization: U = s U is dimensionless: jobs/sec * sec/job U < 1 else saturation Response time: r = s/(1-U) randomness each job sees (virtual) processor slowed down (by other jobs) by factor 1/(1-U), so to accumulate s seconds of real work takes r = s/(1-U) seconds of real time

10
Benchmark Java driver –chooses interarrival times and service times from exponential distributions, –dispatches each job in its own thread, –records actual job CPU usage, response time Input parameters –job arrival rate –mean job service time s Fix s = 1 second, vary (hence U), track r

11
Benchmark validation practice: measured theory: M/M/1 R = 1/(1-U)

12
Theory vs practice “In theory, there is no difference between theory and practice. In practice, there is no relationship between theory and practice.” Grant Gainey “The gap between theory and practice in practice is much larger than the gap between theory and practice in theory.” Jeff Case

13
Explain/remove discrepancy Examine, tune benchmark driver Compute actual coefficients of variation, incorporate in corrected M/M/1 formula Nothing helps Postpone worry – in the meanwhile …

14
HTT on vs HTT off Use this benchmark to measure the effect of hyper-threading on response time Use throughput ( ) as the independent variable “Utilization” is ambiguous (digression)

15
HTT on vs HTT off

16
What’s happening Hyper-threading allows more of the application parallelism to make its way to the ALU Can we understand this quantitatively?

17
Model HTT architecture /2 s 1 s 2 r = + 1 – ( /2) s 1 1 – s 2 preparatory phase service time s 1 execution phase service time s 2

18
Theory vs practice s 1 = 0.13 s 2 = 0.81

19
Model parameters To compute response time r from model, need (virtual) service parameters s 1, s 2 ( is known) Finding s 1, s 2 –eyeball measured data –fit two data points –maximum likelihood –derive from first principles s 1 = 0.13, s 2 = 0.81 make sense 15% of work is preparatory, 85% execution

20
Benchmark validation (reprise) Chip hardware unchanged when HTT off Assume one path used Tandem queue Parameter estimation as before 0

21
Theory vs practice s 1 = 0.045 s 2 = 0.878

22
Future work Do serious statistics Does 1+1 tandem queue model predict hyper- threading response as well as complex 2+1 model? Understand two-processor machine puzzle Explore how s 1 and s 2 vary with application (e.g. fixed vs floating point) Find ways to estimate s 1 and s 2 from first principles

23
Summary Hyper-threading is … Abstraction (modelling) leverages information: you can often understand a lot even when you know very little r = s/(1-U) is worth remembering You do need to connect theory and practice – and practice is harder than theory Questions?

Similar presentations

OK

RAIDs Performance Prediction based on Fuzzy Queue Theory Carlos Campos Bracho ECE 510 Project Prof. Dr. Duncan Elliot.

RAIDs Performance Prediction based on Fuzzy Queue Theory Carlos Campos Bracho ECE 510 Project Prof. Dr. Duncan Elliot.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on political parties and electoral process of japan Ppt on object oriented programming in java Ppt on mercedes benz india Ppt on ready to serve beverages in spanish Ppt on conservation of momentum and energy Ppt on solar energy devices Ppt on contact management system Ppt on eye osce Ppt on management of water resources Ppt on division as equal sharing worksheets