Carnegie Mellon 15-213/18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden 27 th (and last) Lecture, 28 April 2011 Multi-Core.

Slides:



Advertisements
Similar presentations
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
April 27, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
1 Multi-core architectures Jernej Barbic , Spring 2007 May 3, 2007.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
1 Pipelining for Multi- Core Architectures. 2 Multi-Core Technology Single Core Dual CoreMulti-Core + Cache + Cache Core 4 or more cores.
1 Lecture 26: Case Studies Topics: processor case studies, Flash memory Final exam stats:  Highest 83, median 67  70+: 16 students, 60-69: 20 students.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Chapter 18 Multicore Computers
Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Bill Nace and Gregory Kesden (c) All Rights Reserved. All work.
Hyper-Threading, Chip multiprocessors and both Zoran Jovanovic.
8 – Simultaneous Multithreading. 2 Review from Last Time Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3 to 6 issue for.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Multi Core Processor Submitted by: Lizolen Pradhan
Multi-core architectures. Single-core computer Single-core CPU chip.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Multi-Core Architectures
1 Multi-core processors 12/1/09. 2 Multiprocessors inside a single chip It is now possible to implement multiple processors (cores) inside a single chip.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
Hyper-Threading Technology Architecture and Micro-Architecture.
Tahir CELEBI, Istanbul, 2005 Hyper-Threading Technology Architecture and Micro-Architecture Prepared by Tahir Celebi Istanbul, 2005.
1 Multi-core architectures Zonghua Gu Acknowledgement: Slides taken from Jernej Barbic’s lecture notes.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Carnegie Mellon /18-213: Introduction to Computer Systems Instructors: Anthony Rowe, Seth Goldstein and Gregory Kesden 27 th (and last) Lecture,
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Floating Point Numbers & Parallel Computing. Outline Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Multi-Core Architectures 1. Single-Core Computer 2.
HyperThreading ● Improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle ● Duplicates.
Lecture on Central Process Unit (CPU)
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal
Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
PipeliningPipelining Computer Architecture (Fall 2006)
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
COMP 740: Computer Architecture and Implementation
Parallel Processing - introduction
Multi-core processors
INTEL HYPER THREADING TECHNOLOGY
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
/ Computer Architecture and Design
Multi-Core Computing Osama Awwad Department of Computer Science
Hyperthreading Technology
Multi-core architectures
Computer Architecture Lecture 4 17th May, 2006
Multi-Core Architectures
Adaptive Single-Chip Multiprocessing
/ Computer Architecture and Design
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
CSC3050 – Computer Architecture
8 – Simultaneous Multithreading
Presentation transcript:

Carnegie Mellon /18-243: Introduction to Computer Systems Instructors: Anthony Rowe and Gregory Kesden 27 th (and last) Lecture, 28 April 2011 Multi-Core Architectures

Carnegie Mellon Today Final Exam Multi-core Parallelism on multi-core

Carnegie Mellon Final Exam (Tuesday May 3 rd, 1-4pm) Lecture 1: PH 100 & PH 125C Lecture 2: MI Mellon Auditorium Everything from lecture, the book and labs is fair game… Closed-Book, Close-Notes We will provide Exam 1 and Exam 2 Note Sheets Exam Review Session: WEH 7500 Sunday 6-8pm  Questions to

Final Topics (from Exam 1) Assembly Code  Assembly to C translation  Stack management Structure Alignment Ints, Floats Cache

Final Topics (from Exam 2) Process Control Signals File I/O Memory Layout Virtual Memory Dynamic Memory

Final Topics (New) Concurrency + Thread Safety  Reader / Writer Locks, mutex, semaphore  Starvation, deadlock  What makes a function thread-safe vs unsafe? Networking  Understand notion of sockets and ports

Carnegie Mellon Today Multi-core Parallelism on multi-core

Carnegie Mellon Why Multi-Core? Traditionally, single core performance is improved by increasing the clock frequency......and making deeply pipelined circuits... Which leads to...  Heat problems  Speed of light problems  Difficult design and verification  Large design teams  Big fans, heat sinks  Expensive air-conditioning on server farms Increasing clock frequency no longer the way to go forward

Carnegie Mellon Single Core Computer main memory I/O bridge Bus Interface ALU register file CPU chip system busmemory bus disk controller graphics adapter USB controller mousekeyboardmonitor disk I/O bus Expansion slots network adapter network

Carnegie Mellon Single Core CPU Chip Bus Interface ALU register file CPU chip The single core system bus

Carnegie Mellon Multi-Core Architecture Somewhat recent trend in computer architecture Replicate many cores on a single die ALU register file ALU register file ALU register file Core 1 Bus Interface Core 2Core n Multi-core Chip

Carnegie Mellon Within each core, threads are time-sliced (just like on a uniprocessor)

Carnegie Mellon Interaction With the Operating System OS perceives each core as a separate processor OS scheduler maps threads/processes to different cores Most major OS support multi-core today:  Mac OS X, Linux, Windows, …

Carnegie Mellon Today Multi-core Parallelism on multi-core

Carnegie Mellon Instruction-Level Parallelism Parallelism at the machine-instruction level Achieved in the processor with  Pipeline  Re-ordered instructions  Split into micro-instructions  Aggressive branch prediction  Speculative execution ILP enabled rapid increases in processor performance  Has since plateaued

Carnegie Mellon Thread-level Parallelism Parallelism on a coarser scale Server can serve each client in a separate thread  Web server, database server Computer game can do AI, graphics, physics, UI in four different threads Single-core superscalar processors cannot fully exploit TLP  Thread instructions are interleaved on a coarse level with other threads Multi-core architectures are the next step in processor evolution: explicitly exploiting TLP

Carnegie Mellon Simultaneous Multithreading (SMT) Complimentary technique to multi-core Addresses the stalled pipeline problem  Pipeline is stalled waiting for the result of a long operation (float?) ... or waiting for data to arrive from memory (long latency) Other execution units are idle

Carnegie Mellon SMT Permits multiple independent threads to execute SIMULTANEOUSLY on the SAME core Weaving together multiple “ threads ” Example: if one thread is waiting for a floating point operation to complete, another thread can use the integer units

Carnegie Mellon

SMT is not a “ true ” parallel processor Enables better threading (e.g. up to 30%) OS and applications perceive each simultaneous thread as a separate “ virtual processor ” The chip has only a single copy of each resource Compare to multi-core:  Each core has its own copy of resources

Carnegie Mellon Multi-core: Threads run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers IntegerFloating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2

Carnegie Mellon Multi-core: Threads run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus IntegerFloating PointIntegerFloating Point Thread 3 Thread 4

Carnegie Mellon Combining Multi-core and SMT Cores can be SMT-enabled (or not) The different combinations:  Single-core, non-SMT: standard uniprocessor  Single-core, with SMT  Multi-core, non-SMT  Multi-core, with SMT: our fish machines The number of SMT threads is determined by hardware design  2, 4 or sometimes 8 simultaneous threads Intel calls them “ Hyper-threads ”

Carnegie Mellon

SMT/Multi-Core and the Memory Hierarchy SMT is a sharing of pipeline resources  Thus all caches are shared Multi-core chips:  L1 caches are private (i.e. each core has its own L1)  L2 cache private in some architectures, shared in others  Main memory is always shared Example: Fish machines  Dual-core Intel Xeon processors  Each core is hyper-threaded  Private L1, shared L2 caches memory L2 cache L1 cache C O R E 1C O R E 0 hyper-threads

Example: Intel Itanium 2 Carnegie Mellon Designs with Private L2 Caches memory L2 cache L1 cache C O R E 1C O R E 0 L2 cache memory L2 cache L1 cache C O R E 1C O R E 0 L2 cache Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache Quad Core 2 Duo shares L2 in pairs of cores

Carnegie Mellon Private vs Shared Cache Advantages of Private Cache  Closer to the core, so faster access  No contention for core access -- no waiting while another core accesses Advantages of Shared Cache  Threads on different cores can share same cache data  More cache space is available if a single (or a few) high-performance threads run Cache Coherence Problem  The same memory value can be stored in multiple private caches  Need to keep the data consistent across the caches  Many solutions exist  Invalidation protocol with bus snooping,...

Carnegie Mellon Summary Multi-Core Architectures Simultaneous Multithreading Exam Review Session: WEH 7500 Sunday 6-8pm  Questions to Next Time:  There is no next time ☹