Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li.

Slides:



Advertisements
Similar presentations
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Advertisements

TO COMPUTERS WITH BASIC CONCEPTS Lecturer: Mohamed-Nur Hussein Abdullahi Hame WEEK 1 M. Sc in CSE (Daffodil International University)
Lecture 6: Multicore Systems
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Computer Architecture & Organization
Warehouse-Scale Computing Mu Li, Kiryong Ha 10/17/ Computer Architecture.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Chapter 17 Parallel Processing.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
GCSE Computing - The CPU
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Chapter 18 Multicore Computers
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
LOGO Multi-core Architecture GV: Nguyễn Tiến Dũng Sinh viên: Ngô Quang Thìn Nguyễn Trung Thành Trần Hoàng Điệp Lớp: KSTN-ĐTVT-K52.
Guide to Operating Systems, 4th ed.
2007 Sept 06SYSC 2001* - Fall SYSC2001-Ch1.ppt1 Computer Architecture & Organization  Instruction set, number of bits used for data representation,
Multi Core Processor Submitted by: Lizolen Pradhan
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
History of Microprocessor MPIntroductionData BusAddress Bus
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Shashwat Shriparv InfinitySoft.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Pipelining and Parallelism Mark Staveley
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Identifying Hardware Components in a Computer (continued) Clock Speed (continued) The computer has a system clock that generates a regular electronic beat.
Outline Why this subject? What is High Performance Computing?
By Islam Atta Supervised by Dr. Ihab Talkhan
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
PipeliningPipelining Computer Architecture (Fall 2006)
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
CS203 – Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Lynn Choi School of Electrical Engineering
Multiprocessing.
Microarchitecture.
The University of Adelaide, School of Computer Science
Architecture & Organization 1
Multi-Processing in High Performance Computer Architecture:
Architecture & Organization 1
Chapter 1 Introduction.
Computer Evolution and Performance
Chapter 4 Multiprocessors
The University of Adelaide, School of Computer Science
Presentation transcript:

Feb. 19, 2008 Multicore Processor Technology and Managing Contention for Shared Resource Cong Zhao Yixing Li

Moore’s Law Transistors per integrated circuit would double every 2 years. Power requirements in relation to transistor size would double in 1-2 years. Density at minimum cost per transistor, and so on…… Integrated circuits would double in performance every 18 months(By David House) For Single-Core, the only way to improve the performance is increasing the clock frequency. 2

Why Multi-Core? There are many reasons : Difficult to make single-core clock frequencies even higher Deeply pipelined circuits: – heat problems – speed of light problems – difficult design and verification – large design teams necessary Many new applications are multithreaded 3

Why Multi-Core? 4 But the leading reason is: To continue the raw performance growth that customers have come to expect from Moore’s law scaling without being overwhelmed by the growth in power consumption. As single core designs were pushed to ever higher clock speeds, the power required grew at a faster rate than the frequency, and lead to designs that were complex, power hungry, and unmanageable !!!

Why Multi-Core? 5

What is Multi-Core? A single computing component with two or more independent actual central processing units(called "cores") Attributes:  Application Class  Power/Performance  Processing Elements  Memory System  Etc. 6

Application Class  There are two broad classes of processing into which an application can fall: data processing dominated and control dominated.  Data Processing Dominated  A sequence of operations on a stream of data with little or no data reuse.  Image processing, audio processing, and wireless baseband processing.  Control Dominated  Often need to keep track of large amounts of state and often have a high amount of data reuse.  file compression/decompression, network processing. 7

Power/Performance  In the past decade, power has joined performance as a first class design. Many applications and devices have strict performance and power requirements.  Mobile phone, Laptop, Server  “cloud” computing (warehousescale computers) 8 These cloud computing centers are now consuming more energy than heavy manufacturing in the United States.

Processing Elements Architecture and Microarchitecture Architecture:  Instruction set architecture (ISA), defines the hardware softwareinterface.  Reduced instruction set computer(RISC)  Complex instruction set computer (CISC). Microarchitecture:  The microarchitecture is the implementation of the ISA.  In-order processing element.  Out-of-order processing element.  SIMD, VLIW 9

Memory System  In uniprocessor designs, the memory system was a rather simple component, consisting of a few levels of cache to feed the single processor with data and instructions.  In Multi-core design:  Consistency model.  Cache configuration.  Cache coherence support.  Intrachip interconnect.  All of these determine how cores communicate impacting programmability, parallel application performance, and the number of cores that the system can adequately support. 10

Memory System  Consistency Model  Strong Consistency and Weak Consistency 11 0

Memory System  Cache Configuration  Caches give processing elements a fast, high bandwidth local memory to work with.  Caches can be tagged and managed automatically by hardware or be explicitly managed local store memory.  The amount of cache re quired is very application dependent.  The first level of cache (L1)is usually rather small, fast, and private to each processing element. Subsequent levels (L2)can be larger, slower, and shared among processing elements. 12

Memory System  Intrachip interconnect.  Bus, Crossbar, Ring, and Network-on-chip (NoC)  Cache coherence  Broadcast based and Directory based. 13

14 Architecture of A Multicore System Fig2-1. Schematic of a Multicore System with Two Memory Domains Compete for the shared resources! Problems?

15 Cache Contention  Thread A request a line not in the cache (a cache miss) and the cache is full  Some data must be evicted to free up a line  The evicted line might belong to B or A itself Hurt the performance Thread A Thread B

16 Cache Miss Frequency Temporal locality Reuse frequency Miss frequency A.McfRather poorLowHigh B. PovrayExcellentHigh →0→0 C. MilcPoorVery lowVery high Fig2-2. Example Memory-Reuse Profiles from SPEC CPU2006 Suite

17 Pain Metrix  Pain(A|B) = SA * ZB  Pain(B|A) = SB * ZA  Pain(A,B) = Pain(A|B) + Pain(B|A) Performance degradation of A when A runs with B relative to running solo S Sensitivity, measures how much a thread suffers whenever it shares the cache with other threads Z Intensity, measures how much a thread hurts other threads

18 Evaluation of Pain Model

19 Evaluation of Pain Model Fig2-3. Performance of Estimated Best Schedules Compared with Actual Best Schedules

20 Evaluation of Pain Model Fig2-4. Worst-Case Performance under DIO Relative to the Default Linux Scheduler ·Average performance improvement: 11% ·High-miss-rate applications must be kept apart

21 Problem of Pain Model  2 memory domain  2 cores per domain  8 memory domain  2 cores per domain What about? Example shows..

22 Future Work of Pain Model Combination of schedules 2 memory domain 2 cores per domain 3 4 memory domain 2 cores per domain 105  High-miss-rate applications must be kept apart  We can combine 1 of 4 high-miss-rate cores with 1 of 4 low- miss-rate cores within one domain  Combination of schedules can be reduced to 24

23 Conclusions  The main advantage to multicore systems is that raw performance increase can come from increasing the number of cores rather than frequency, which translates into a slower growth in power consumption. However, this approach represents a significant gamble because parallel programming science has not advanced nearly as fast as our ability to build parallel hardware.

24 Thank you