Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.

Similar presentations


Presentation on theme: "Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions."— Presentation transcript:

1 Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 4, December 2005, Pages 397-422. Presented by: Manal Houri March 23, 2006

2 Computer Science and Engineering Motivation Low-power computing has long been an important design objective for mobile, battery-operated devices. More recently, however, power consumption in high-performance microprocessors has drawn considerable attention from industry and researchers. Traditionally, power dissipation in CMOS technology has been significantly lower than other technologies. However, at current speeds and feature sizes, CMOS power consumption has increased dramatically. This makes microprocessor cooling increasingly difficult and expensive. CSE 8383 Advanced Computer Architecture

3 Computer Science and Engineering Direction As a result, over the last few years, power has become a first- priority concern to microprocessor designers/manufacturers. Industry and researchers are eyeing chip multiprocessor architectures (CMPs), which can attain higher performance by running multiple threads in parallel. By integrating multiple cores on a chip, designers hope to deliver performance growth while depending less on raw circuit speed (power that is). So far, however, very little work has been done on the power- performance issues involving parallel applications executing on multiprocessors, in general and on multicore chips, in particular. CSE 8383 Advanced Computer Architecture

4 Computer Science and Engineering Problem In a parallel run, processors synchronize and exchange data as they cooperate toward a common goal. Synchronization and communication constitute overheads that typically grow in importance as we increase the number of processors. This generally results in decreased parallel efficiency—speedup over number of processors used. In an application’s changing parallel efficiency, it is not obvious which voltage and frequency levels should be applied, in combination with the appropriate number of processors, to optimize a certain power-performance trade-off and/or meet a particular constraint. CSE 8383 Advanced Computer Architecture

5 Computer Science and Engineering Objective of this paper CSE 8383 Advanced Computer Architecture Investigate the power-performance issues of running parallel applications on a CMP: Develop an analytical model to study the effect of: the number of processors used, the parallel efficiency, and the voltage/frequency scaling applied, on the performance and power consumption delivered by a CMP. More specifically: Optimizing power consumption given a performance target, Optimizing performance given a certain power budget, Then, Provide detailed simulations of parallel applications running on a power-performance model of a CMP are conducted.

6 Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

7 Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

8 Computer Science and Engineering Analytical Study – Power Equations CSE 8383 Advanced Computer Architecture This equation establishes the relationship between the supply voltage V and maximum operating frequency f max. f max = η (V − V th ) α / V Where: f max : maximum operating frequency η, α: experimentally derived constants V: supply voltage V th : threshold voltage

9 Computer Science and Engineering Analytical Study – Power Equations CSE 8383 Advanced Computer Architecture This equation defines power consumption P as the sum of dynamic and static components. P = P D + P S = AC V 2 f + V Where: P D : dynamic component P S : static component A: gate activity factor C: total capacitance V: supply voltage f: operating frequency I leak : leakage current at nominal supply voltage V n and room temperature T std (25°C). is the abbreviation of dependency of this curve-fitted formula on supply voltage and temperature.

10 Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

11 Computer Science and Engineering Analytical Study – Performance Equations CSE 8383 Advanced Computer Architecture Execution time formula as proposed by Hennessy and Patterson: t = IC CPI f -1 IC: dynamic instruction count CPI: average number of cycles per instruction Parallel efficiency of an application running on N processors: Nominal parallel efficiency Parallel efficiency

12 Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

13 Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture Goal Find the configuration that maximizes power savings while delivering a pre-specified level of performance, in particular, deliver the performance of a sequential execution on one processor at full throttle. In other words, t 1 = t N IC CPI f -1 = IC N CPI N f N -1 Using the definition of nominal parallel efficiency, we rewrite the above as:

14 Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture Performance can be expressed for N-processor configuration as: If we define the voltage scaling ratio the above can be rewritten as:

15 Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture We look at 2 process technologies: 130 nm and 65 nm. The operating temperature of the single-core is set at 100°C. Using a 32-way CMP baseline, configurations running on different number of processor cores are studies. Assumption: unused cores are shut down.

16 Computer Science and Engineering Analytical Study - Scenario I: Power Optimization CSE 8383 Advanced Computer Architecture 130 nm technology

17 Computer Science and Engineering CSE 8383 Advanced Computer Architecture 65 nm technology Analytical Study - Scenario I: Power Optimization

18 Computer Science and Engineering Scenario I: Power Optimization – Experimental Evaluation CSE 8383 Advanced Computer Architecture

19 Computer Science and Engineering Analytical Study CSE 8383 Advanced Computer Architecture Power Equations Performance Equations Scenario I: Power Optimization Scenario II: Performance Optimization

20 Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture Goal: Find the configuration that maximizes performance under a constrained power budget. The maximum power budget is set to that of executing on one processor at full throttle, P 1 = P N

21 Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture The performance gain or speedup S on N processors can be expressed as: Using the definition of voltage ratio and equation of frequency, we get:equation of frequency

22 Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture In order to compute, we use P 1 = P N : Now we can solve for S having obtained. f

23 Computer Science and Engineering Analytical Study - Scenario II: Performance Optimization CSE 8383 Advanced Computer Architecture

24 Computer Science and Engineering Scenario II: Performance Optimization – Experimental Evaluation CSE 8383 Advanced Computer Architecture

25 Computer Science and Engineering Concluding remarks CSE 8383 Advanced Computer Architecture This study shows that, under the right circumstances, parallel computing may bring significant power-performance benefits over a uniprocessor setup of similar performance. However, it also illustrates the dependency of the optimum operating point on multiple interacting factors, such as: the application’s parallel efficiency, the chip’s voltage/frequency scaling characteristics, the process technology, and restrictions in performance, power, and, sometimes, number of available processors. This study shows that these factors can interact in a nonobvious way, making power-performance optimization of on-chip parallel computation quite challenging.


Download ppt "Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions."

Similar presentations


Ads by Google