Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 18 Multicore Computers

Similar presentations


Presentation on theme: "Chapter 18 Multicore Computers"— Presentation transcript:

1 Chapter 18 Multicore Computers
Vy Luong

2 Multicore Computers Chip multiprocessor: combines two or more processors (cores) on a die. Each core consists of: registers, ALU, pipeline hardware, control unit and L1 cache. Die: piece of silicon. Some have L2 and L3 cache.

3 Hardware Performance Issues
Goal: increase instruction-level parallelism. Superscalar Replicate execution resources enabling parallel execution of instructions in parallel pipelines. Simultaneous multithreading (SMT) Duplicate register banks so that multiple threads can share the use of pipeline resources. Problem: Managing multiple threads and power consumption. INSTRUCTION-LEVEL PARALLELISM More work can be done in each clock cycle. After the 1980s, people began to exploit pipelining. While one instruction is executing, another is executing in a different pipeline. Pipelines increase and more logic is required to manage hazards. Transistors per chip (power issue)

4 Why Multicore? Control power density by using more of the chip area for cache memory (instead of logic transistors). Near linear performance improvement. Pollack’s rule: performance increase is roughly proportional to square root of increase in complexity (40%). IF DOUBLE LOGICS TRANSITORS, PERFORMANCE INCREASE IS ONLY BY ABOUT 40%.

5 Applications That Benefit From Multicore Systems
Servers Multithreaded native applications Multiprocess applications Java applications Multiinstance applications Valve Game Software • Multithreaded native applications: Multithreaded applications are characterized by having a small number of highly threaded processes. Examples of threaded applications include Lotus Domino or Siebel CRM (Customer Relationship Manager). • Multiprocess applications: Multiprocess applications are characterized by the presence of many single-threaded processes. Examples of multi-process applications include the Oracle database, SAP, and PeopleSoft. • Java applications: Java applications embrace threading in a fundamental way. Not only does the Java language greatly facilitate multithreaded applications, but the Java Virtual Machine is a multithreaded process that provides scheduling and memory management for Java applications. Java applications that can benefit directly from multicore resources include application servers such as Sun’s Java Application Server, BEA’s Weblogic, IBM’s Websphere, and the open-source Tomcat application server. All applications that use a Java 2 Platform, Enterprise Edition (J2EE platform) application server can immediately benefit from multicore technology. • Multiinstance applications: Even if an individual application does not scale to take advantage of a large number of threads, it is still possible to gain from multicore architecture by running multiple instances of the application in parallel. If multiple application instances require some degree of isolation, virtualization technology (for the hardware of the operating system) can be used to provide each of them with its own separate and secure environment.

6 Valve Reprogrammed Source engine software to use multithreading to exploit the power of multicore processor chips from Intel and AMD. Twice the performance with coarse threading. Hybrid threading approach (combine coarse with fine-grained threading). Scene-rendering lists for multiple scenes in parallel (and other graphic-related simulation). Coarse threading: individual modules assigned to individual processors. Fine-grained threading: similar or identical tasks are spread across multiple processors. The first task is to determine what are the areas of the world that need to be rendered. The next task is to determine what objects are in the scene as viewed from multiple angles. Then comes the processor-intensive work. The rendering module has to work out the rendering of each object from multiple points of view, such as the player’s view, the view of TV monitors, and the point of view of reflections in water.

7 Multicore Organization
Variables in a multicore organization: Number of core processors on the chip Number of levels of cache memory Amount of cache memory that is shared Found in earlier multicore computer chips. Dedicated L1 cache. No on-chip cache sharing. AMD Opteron. Shared L2 cache. Intel Core Duo. Shared can reduce overall miss rates. Data shared by multiple cores is not replicated Amount of shared cache allocated to each core is dynamic – less locality employ more cache. Interprocessor communication is easy to implement Intel Core i7 Rapid access to private L2 cache – for threads that exhibit strong locality. L3 – combined with shared or dedicated L2 provide better performance than shared L2.

8 Superscalar or SMT? Intel Core Duo: individual cores are superscalar.
Intel Core i7: Implement SMT cores. Advantages: scales up the number of hardware threads that the system supports. Multicore system with four cores (and SMT) that supports four simultaneous threads in each core, on the application level, appears the same as 16 cores. SMT appears to be more attractive than superscalar.

9 Intel Core Duo Introduced in 2006. Two x86 superscalar processors.
Separate thermal control units. Advanced Programming Interrupt Controller Manage heat dissipation to maximize processor performance. Improves ergonomics with cooler system Lower fan acoustic noise If temperature in a core exceeds a threshold, TCU reduces clock rate for that core to reduce heat generation. APIC Provide interprocessor interrupts One core can generate an interrupt which is accepted by the local APIC and sent to the APIC of the other core. Power management logic Reducing power consumption Monitors CPU activity and adjusts voltage levels and power consumptions appropriately. Advanced power-gating capability: turns on individual processor logic subsystems when necessary. L2 Cache: dynamic allocation – based on current core needs

10 Intel Core i7 Introduced in November 2008. Four x86 SMT processors
DDR3 QuickPath interconnect DDR3 Brings memory controller for DDR main memory onto the chip. QPI Enables highspeed communication among processor chip.

11 ARM11 MPCore Can be configured with up to four processors. DIC Timer
Watchdog Distributed interrupt controller Handles interrupt detection Timer Generate interrupt Watchdog Issues warning alerts in the event of software failure. CPU interface Handles interrupt acknowledgement, masking, and completion acknowledgement. CPU A single ARM11 and individual CPUs are called MP11 CPUs

12 Interrupt Handling support between 0 and 255 hardware interrupt inputs. maintains a list of interrupts, showing their priority and status. DIC satisfies two requirements: Routing an interrupt request to a single CPU or CPUs, as required. Provide interprocessor communication so a thread on one CPU can cause activity by a thread on another CPU. Interrupts: Inactive - processed by that CPU but pending or active in some CPUs to which it is targeted. Pending – asserted but processing has not started. Active – started but processing is not completed.

13 Cache Coherency Snoop unit control (SCU): resolve bottlenecks related to access to shared data. The SCU introduces three types of optimization: direct data intervention enables copying clean data from one CPU L1 data cache to another CPU L1 data cache without accessing external memory. duplicated tag RAMs duplicated versions of L1 tag RAMs used by the SCU to check for data availability before sending coherency commands to the relevant CPUs. migratory lines enables moving dirty data from one CPU to another without writing to L2 and reading the data back in from external memory.


Download ppt "Chapter 18 Multicore Computers"

Similar presentations


Ads by Google