Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction of Multicore Impacts

Similar presentations


Presentation on theme: "Introduction of Multicore Impacts"— Presentation transcript:

1 Introduction of Multicore Impacts
Security Lab 12/4/2008

2 Transistors and Clock Rate
So, processors got faster every 18 months (roughly 50% to 100%!) Why bother with parallel programming? Just wait a year or two… Slide Source:

3 The “Power Wall” High Power Consumption and Heat Dissipation power.
complexity. memory, Our recent rule of thumb has been that processor performance improves as the square root of the number of transistors, and cache-miss rates likewise improve as the square root of the cache size. High Power Consumption and Heat Dissipation

4 Conventional Bulk CMOS SOI (silicon-on-insulator)
Technology Scaling – We’ve Hit The Wall 0.2 0.4 0.6 0.8 1 2 4 6 8 10 20 1988 1992 1996 2000 2004 2008 2012 Conventional Bulk CMOS SOI (silicon-on-insulator) High mobility Double-Gate Year Relative Device Performance ? Complementary metal–oxide–semiconductor (CMOS) is a major class of integrated circuits. Silicon on insulator technology (SOI) refers to the use of a layered silicon-insulator-silicon substrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance and thereby improve performance. A double-gate transistor is a field effect transistor (FET) fabricated with gate structure, to control the conductivity of the transister, on both sides of the channel. 2019/6/4

5 Has This Ever Happened Before?
140 Bipolar CMOS IBM RY5 IBM GP IBM RY6 Apache Pulsar Merced IBM RY7 IBM RY4 Pentium II(DSIP) Pentium 4 120 IBM ES9000 ? 100 80 Fujitsu VP2000 Watts / cm2 IBM 3090S 60 NTT Fujitsu M-780 40 Bipolar: 功耗大 IBM 3090 Start of CDC Cyber 205 20 Water Cooling IBM 4381 IBM 3081 Fujitsu M380 IBM 370 IBM 3033 IBM 360 Vacuum 1950 1960 1970 1980 1990 2000 2010 Source: Bernie Meyerson, IBM 2019/6/4

6 Multicores Save Power Multicores with simple cores decreases frequency and power Example : Uni-processor w/ power budget N Increase frequency by 20% Substantially increases power, by more than 50% But, only increase performance by 13% Decrease frequency by 20% (e.g., simplifying core) Decreases power by 50% Can now add another simple core Power budget stays at N with increased performance!

7 Industry trends Intel Quad-Core Sun’s 8-Core Chips: T1 - Niagra
Cell Broadband Engine Dual Cores Four Cores Eight Cores 2019/6/4

8 The Rise of Multicores Around the same time:
Cell B.E. released with 8 cores Nvidia Graphics Processing Unit (GPU) has 128 cores Intel demonstrates an 80-core research chip Slide Source: Amarasinghe, 6189 IAP 2007

9 Multicore Architecture

10 Multicore Architecture
2019/6/4

11 Hierarchy of Modular Building Blocks
Systems will increasingly need to implement a hybrid execution model New programming systems need to reduce the need for programmer awareness of the topology on which their program executes Grid/Cluster High Speed Network Hierarchical SMP servers with non-uniform memory access characteristics Rack High Speed Network Hierarchical SMP servers with NUMA characteristics Board SMP Interconnect Homogenous SMP on Board 2 – 128 HW contexts on board Main Processor(s) with Accelerator(s) Master-Slave relationship between entities Memory Memory SMP: symmetric multi-processor NUMA: Non Uniform Memory Access Chip Homogenous SMP on chip 2-32 HW contexts on chip Various forms of resource sharing Heterogenous collection of processors on chip Heterogenity at data and control flow level Cache I/O Attach Interconnect Fabric Mem Ctrl Core Core The next gen programming system must support programming simplicity while leveraging the performance of the underlying HW topology. Core Core Core will support multiple HW threads sharing a single cache exhibiting SMP characteristics. 2019/6/4

12 Looming “Multicore Crisis”
Slide Source: Berkeley View of Landscape

13 Looming “Multicore Crisis”

14 Architecture trends Several processor cores on a chip and specialized computing engines XML processing, cryptography, graphics Questions: how to interconnect large number of processor cores how to provide sufficient memory bandwidth how to structure the multilevel caching subsystem how to balance the general purpose computing resources with specialized processing engines and all the supporting memory, caching and interconnect structure, given a constant power budget Software development processes how to program for multicore architectures how to test and evaluate the performance of multithreaded applications 2019/6/4

15 Programming multiprocessor systems
Two main directions: explicit manual programming exploit the combination of compiler optimization, build tool chains, and run-time subsystems In HPC and embedded communities: emphasis was more on explicit manual programming and special resources by expert programmers resulted in numerous home-grown language directives and extensions, internal tools, obscure run-time systems hardly portable to new generations of hardware 2019/6/4

16 Programming languages
Very few new languages were invented in the last 2 decades Java - virtual machine, interpreter, JIT, garbage collection, set of libraries, etc. Can multicore spur development of new language/environment for parallelism? map-reduce, cilk, UPC, X10, and STAPL programmers can provide additional information related to parallelism Multicore provide multiple types of parallelism thread-level parallelism (TLP) – coarse-grain OpenMP - standard for shared-memory models MPI - standard for distributed-memory models pthreads, java threads - explicitly use automatic parallelization optimizations Most of the original auto-parallelizing compilers focused on FORTRAN data-level parallelism (DLP) – fine-grain auto-vectorization, auto-simdification What about asymmetric multicore architectures (like Cell processor)? is it possible to have a single source compilation for multiple ISAs? - initial attempts… how OpenMP can be used for programs - streaming 2019/6/4

17 Performance Analysis Tools
Profile based tools – data aggregation FDPR-Pro, Code Analyzer, Diablo Performance evaluation is heavily influenced by thread interaction stales, locks, races, memory thrashing, pollute hardware counters trace-based analysis and visualization introduces timeline views and data to deal with communication issues lack of scalability: tend to grow fast, making it difficult to manipulate and visualize In HPC context: selecting arbitrary subset of cores/threads and arbitrary time intervals tracing might disturbs program's behavior HPCToolkit, TAU, Paraver, VTune, Code Analyzer, PDT, Trace Analyzer Lack of determinism 2019/6/4

18 Performance tools for multi-core: Cell
Visual Performance Analyzer 5.0 Cell SDK 3.0 Profile Analyzer Code Analyzer Pipeline Analyzer Trace Analyzer PDT Lock Analyzer Infrastructure for collecting profiles on several systems Infrastructure for using databases for large data sets Set of interconnected views Cell support Infrastructure for collecting traces on SDK 3.0 libraries Analysis of lock usage Input for Trace Analyzer 2019/6/4

19 Debugging and testing tools
Concurrent problems constitute about 10% of the bugs Bugs like crashes (races) or freeze (deadlocks) stay in the application reducing the up-time Testing is done at load testing - very late in the process A tool supported methodology try to find the concurrency issues as early as possible: ConTest - a tool supported method for measuring contention Make the tests that are likely to exhibit bugs - changing the internal timing Tools for pinpointing locations of bugs if we have a test that we can cause the application to fail some of the time healing bugs so that the impact will not be seen 2019/6/4

20 Software trends Software enablement system for multicores
Various directions for providing solutions Active area of research only some early results in the academic and industrial worlds in terms of established standards and technology much more will evolve in the years to come Need: programming models and compiler support for multicores performance evaluation tools testing and debugging tools 2019/6/4

21 Thank You 2019/6/4

22 CMOS Complementary metal–oxide–semiconductor (CMOS) is a major class of integrated circuits. S Source G Gate (栅极) D Drain 2019/6/4


Download ppt "Introduction of Multicore Impacts"

Similar presentations


Ads by Google