Presentation is loading. Please wait.

Presentation is loading. Please wait.

DTM and Reliability High temperature greatly degrades reliability

Similar presentations


Presentation on theme: "DTM and Reliability High temperature greatly degrades reliability"— Presentation transcript:

0 Reliability-aware Thermal Management for Hard Real-time Applications on Multi-core Processors
I like to thank the session chair for his generous introduction and good afternoon to everyone. Today I will be presenting techniques for maximizing the reliability of multi-core processors for real-time applications through the use of thermal management techniques. Vinay Hanumaiah1 and Sarma Vrudhula2 1Electrical Engineering , Arizona State University 2Computer Science Engineering , Arizona State University

1 DTM and Reliability High temperature greatly degrades reliability
high peak temperature large no. of thermal cycles 10°C – 15°C increase reduces reliability by half Multi-cores have large temporal and spatial thermal variations higher gradients  higher reliability degradation requires invoking DTM more often DTM allows complex objectives and granular control First, let us understand the need for dynamic thermal management or DTM to enhance the reliability of processors. It has been shown that temperature affects reliability to a great extent. This degradation of reliability can be due to: the effect of high peak temperatures, large number of thermal cycles It has been estimated that a 10—15 C increase in temperature can decrease the reliability by half. In particular multi-core processors have lot more temporal and spatial thermal variations than single-core processors, this leads to higher temperature gradients, which degrades reliability much more. In order to avoid this degradation in reliability, DTM has to be invoked more often than for single-core processors Finally, DTM is versatile in incorporating complex objectives, e.g. it is possible to have a DTM technique that can address combined minimization of peak temperature and thermal cycles. Also, DTM has lot more control knobs, like speed, voltage, migration, which can be used suitably to improve reliability.

2 Related Work Effects of temperature on reliability
Coskun:Sigmetrics’07 Lu:IEEEMICRO’05 Min. peak temperature with deadline constraints Chantem:DATE’08 (many-core, task allocation), Jayaseelan:ICCAD’08 (single, task sequence) Maximize throughput Wang:ECRTS’06 (thermal, timing, single-core) Murali:CODES’07 (thermal, no deadlines, many-core) We briefly mention the related literature in the area of DTM for reliability. The first two works does comprehensive research on the effects of temperature on reliability and the need for DTM to address reliability There have been many other works in the area of DTM related to our work, but they lack in addressing these: Do not consider transient cores speed determination. This is a much harder problem than allocation and sequencing of tasks as we will see it later Leakage power dependence on temperature Do not consider both the thermal and deadline constraints for a many core processor.

3 What is our Contribution?
Determine optimal speed profile For many core processor Minimize peak temperature Satisfy task deadlines, while considering start times include leakage dependence on temperature Here is our contribution through this work. We propose the derivation of an optimal transient speed profile for a multi-core processor to minimize the peak temperature of operation while satisfying both task deadlines and start times We use accurate power and thermal models, including leakage dependence on temperature

4 Power and Thermal Model
Figure on the left shows the full hotspot RC thermal model which uses the electro-thermal analogy modeling resistance and capacitances for heat spreading and storing respectively has the granularity of a functional block has several layers like die, TIM, heat spreader, sink and cooling to account for differential heat conduction Figure on the right shows a simplified thermal model used in our work in which we ignore the lateral resistances between the functional blocks as they are four times higher than the vertical resistances, hence conduct less heat, ignore the die capacitances as our tasks are much longer than the die thermal time constant, order of few ms, thus saturating the capacitances lumped package nodes An important thing to note is that this simplified model is required in our work as it yields an analytical equation for estimating the no. of instruction completed for a given interval of time. Also note that we are not compromising the accuracy of performance or temperature prediction as we have demonstrated that the error in using this simplified model is less than 6%. ignores lateral resistance ignores die capacitances Lumped package < 6% loss in accuracy required for analytical analysis Full HotSpot model Simplified thermal model

5 Problem Formulation Objective Given Constraints Assumptions
Find cores speed profile that minimizes peak temperature Given n tasks, instruction length, power profile n cores, RC thermal model Constraints Start times and deadlines Assumptions Independent and non-identical threads One thread per core Simplified thermal model Here is our problem formulation: Our objective is to determine the time-varying speed profile of all cores such that they minimize the peak temperature We are given n tasks which are run on n cores. We know their instruction lengths, power profiles and also the thermal characteristics of cores The derived speed profile has to satisfy the start and the deadlines of tasks Our assumptions are consistent with GPUs viz. one thread per core and threads are independent of each other. The other assumption is the use of the simplified thermal model described before.

6 Solution Outline Step 1 – Find parametric optimal speed profile [Hanumaiah:DATE’09] Fixed maximum temperature No deadlines Step 2 – find Parameters in Step 1 for every slot To satisfy task deadlines for given initial package temperature Now we present the outline of the solution to our problem formulation. In the step 1 of our solution, we determine the parametric form of the optimal speed profile for a fixed maximum temperature, but with no deadline constraints For step 2, we divide the entire duration of tasks into several slots based on the task start times and deadlines as shown in this figure. Next, we determine the parameters of the solution in step 1 to satisfy task deadlines for every time slot for a given initial package temperature of the slot. Note that the optimality for every time slot ensures the optimality for the entire duration. By optimality, we mean achieving min. peak temperature, while satisfying boundary conditions.

7 Solution Outline - contd
Step 3 – For every slot find initial package temperature to satisfy start times also determine global min peak temperature Step 2 can be solved only if we know the package temperature at the beginning of each slot. So in step 3 we determine the initial package temperatures such that the deadlines are satisfied and the global peak temperature is minimized.

8 Step1: Fixed max. temp., no deadlines
Here is the optimal speed profile for a core in the Step 1, i.e. for a fixed maximum temperature and with no deadlines. The optimal speed is selected such that either the speed is at the maximum or the temperature is maintained at the maximum. The corresponding speed equation is given by this. where s_i0 is the initial speed, tau_p is the package thermal time constant and s_i,ss is the steady-state speed. Note that other than the steady-state speed, all other terms are known apriori at the beginning of a task execution.

9 Step 2: Fixed max. temp., with deadlines
Need for Step 2 However, the step 1 by itself is not optimal if the tasks have deadlines as shown in this figure. In the top figure, we execute a pair of tasks according to Step 1, with the task 2 in blue having its deadline at t_d,2. We see that even though the speed policy in Step 1 achieves shorter makespan, the deadline for task 2 was violated. The desired execution is depicted in the bottom figure. Observe that the execution speed of task 1 is lowered before the deadline t_d,2 to allow task 2 to meet its deadline and later the speed of task 1 is increased. Notice that the makespan in the bottom figure is higher. Thus the speed profile of Step 1 has to be modified in order to meet task deadlines. This is the core of step 2.

10 Step 2: Fixed max. temp., with deadlines
Find optimal speed profile for the critical task Determine Tpkg over the slot In step 2, we first identify the critical task for a slot, i.e. the task with its deadline in the current slot. Determine the steady-state speed of this critical task such that it completes its execution within the deadline. Notice that, this is the equation from Step 1. Determine the corresponding package temperature at all times. This is a straight forward computation as we know the power consumption and the temperature of any core. Once we know the package temperature, the total power consumption P_T can be determined from the following differential equation for package temperature computation. Find the total power PT for corresponding Tpkg

11 Step 2: Power allocation scheme
Let tsched = unit scheduling interval Determine approx. dTpkg(tsched)/dt Find corresponding PT (tsched) PT (tsched) = PT (tsched) – Pcritical (tsched) Sort tasks according to nearest deadline Allocate max. power Pmax,i (tsched) to the earliest task PT (tsched) = PT (tsched) – Pmax,i (tsched) Continue until PT (tsched) =0 After the speed of the critical task is determined, we need to determine rest of the core speeds. For this, we define t_sched as the unit speed scheduling interval. For every scheduling interval within a slot, We determine an approximate dT_pkg/dt and the corresponding total power budget P_T for the scheduling interval. Subtracting the power consumption of the critical task, we get the remaining total power budget that needs to be allocated to the remaining tasks. Now sort the tasks according to the deadlines. Allocate the max feasible power constrained by the maximum temperature to the task with the earliest deadline as it corresponds to the maximum feasible speed of that task Continue this process for other cores in the order of their deadlines, until the power budget is completely utilized. Notice that this is a greedy approach which ensures that the tasks with the earliest deadline s are satsified first.

12 Step 3: Satisfy Start Times
Instruction completed in each slot is monotonic with initial package temperature of slots with the maximum temperature In the step 3, we satisfy the last constraint, viz., the start times of tasks. We make an observation that the instruction completed in each slot is a monotonic function of the initial package temperature, e.g., if T_p1 is increased, then instructions completed in the future slots 2 and 3 are reduced monotonically with T_p1. On the other hand, increasing T_p1, increases the instructions executed in the previous slot 1 correspondingly. This is true with decreasing T_p1 also. Similarly, increasing the maximum temperature allows more instructions to be executed and decreasing the maximum temperature decreases the execution rate of instructions. Thus determining the initial package temperatures of slots for minimizing the global peak temperature can be solved through quasiconcave optimization. Can be solved optimally as quasiconcave (monotonic) optimization

13 Experimental Setup Multicore version of Alpha 21264
HotSpot – thermal model, PTScalar – power model SPEC benchmarks Dynamic power – 230 W, leakage power – 60 W Scheduling interval – 10 ms Now we move on to the experimental results. This is our experimental setup. We constructed a multi-core version of Alpha 21264, by replicating a single Alpha core and scaling them to fit the die area of the single core. We used hotspot thermal model; and PTScalar simulator for obtaining power profiles of SPEC benchmarks. We constrained our power numbers as shown here. The scheduling interval was set at 10 ms, the die thermal time constant.

14 Trade-off: Peak Temperature vs Deadlines
In our first experiment, we tested our algorithm for two scenarios of tasks. In the first scenario, the tasks had relaxed deadlines. In the second scenario, the task deadlines were made tighter. E.g. the task gcc in green has deadline at 33 s in the relaxed deadline scenario and 29 s in the tight deadline scenario. Similarly, for other tasks. Apart from this, rest of the parameters were same, viz., the start times, tasks’ power and instruction profiles, and cores thermal characteristics. The plots on the top show the transient core speeds and the bottom plots show the corresponding temperature. Notice how the core speeds are suitably modified by the algorithm such that the tasks end exactly at their specified deadlines and thus their peak temperature of operation also changed correspondingly from 102 C in the relaxed deadline scenario to 118 C for tighter deadline scenario. Thus our algorithm finds optimal tradeoff between peak temperatures and deadlines. Relaxed deadlines Tight deadlines

15 Optimal Policy vs Min. Makespan Policy
In our next experiment, we compared our deadline optimal algorithm with the min. makespan policy, which is same as the policy obtained in Step 1. Since the min. makespan policy tries to minimize the overall makespan at a constant maximum temperature, it is oblivious to any deadlines and thus either violates the deadlines or executes at a higher temperature. Notice that there is an 8 C increase in peak temperature for the relaxed deadline scenario. Opt. policy - relaxed deadlines Min. makespan

16 Discretization of Optimal Policy
Finally we demonstrate the practical use of our policy. Figure on the right shows the discretized version of our continuous speed policy for the tight deadline scenario. The continuous speeds are discretized such that the highest discrete speed that is less than the continuous speed is selected. Due to this approximation, either few deadlines may be violated or the peak temperature may increase slightly as seen from the figures. Discrete version 8 speeds Continuous version

17 Summary Proposed reliability-aware transient speed policy
Minimizes peak temperature Satisfies task deadlines and start times Includes accurate power and thermal models Optimal trade-off of peak temperature with deadlines Incorporated in Magma simulator Fast, accurate thermal-aware architectural simulator Available as open source at In summary, we proposed a reliability-aware optimal transient speed policy, which minimizes peak temperature satisfies both start times and deadlines of tasks includes accurate power and thermal models Results showed that our algorithm is capable of trading off optimally between peak temperature and task deadlines. As a final note, our techniques have been incorporated in a thermal-aware architectural simulator called MAGMA, which is available as open source at this site. We encourage everyone to make use of it. Thank you every one for attending my talk!


Download ppt "DTM and Reliability High temperature greatly degrades reliability"

Similar presentations


Ads by Google