Structure of Computer Systems

Slides:



Advertisements
Similar presentations
Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Advertisements

© DEEDS – OS Course WS11/12 Lecture 10 - Multiprocessing Support 1 Administrative Issues  Exam date candidates  CW 7 * Feb 14th (Tue): * Feb 16th.
SE-292 High Performance Computing
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 6: Multicore Systems
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.
Chapter Hardwired vs Microprogrammed Control Multithreading
Chapter 17 Parallel Processing.
Parallel Computer Architectures
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Computer System Architectures Computer System Software
LOGO Multi-core Architecture GV: Nguyễn Tiến Dũng Sinh viên: Ngô Quang Thìn Nguyễn Trung Thành Trần Hoàng Điệp Lớp: KSTN-ĐTVT-K52.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Lecture 13: Multiprocessors Kai Bu
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Outline Why this subject? What is High Performance Computing?
Computer performance issues* Pipelines, Parallelism. Process and Threads.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Lecture 13: Multiprocessors Kai Bu
Applied Operating System Concepts
Lynn Choi School of Electrical Engineering
Multiprocessing.
Microarchitecture.
Distributed Processors
Parallel Processing - introduction
5.2 Eleven Advanced Optimizations of Cache Performance
Multi-Core Computing Osama Awwad Department of Computer Science
Multi-Processing in High Performance Computer Architecture:
CMSC 611: Advanced Computer Architecture
Levels of Parallelism within a Single Processor
Kai Bu 13 Multiprocessors So today, we’ll finish the last part of our lecture sessions, multiprocessors.
Multicore / Multiprocessor Architectures
Multi-Core Architectures
Operating System Concepts
Constructing a system with multiple computers or processors
Chapter 1 Introduction.
CSC3050 – Computer Architecture
Chapter 4 Multiprocessors
Operating System Concepts
Presentation transcript:

Structure of Computer Systems Course 6 Multi-core systems

Multithreading and multi-processing Exploiting different forms of parallelism: data level parallelism (DLP) – same operations on a set of data – SIMD architectures, multiple ALUs instruction level parallelism (ILP) – instructions phases executed in parallel – pipeline architectures thread level parallelism (TLP) – instruction sequences/streams executed in parallel – hyper-treading, multiprocessor architectures (mult-icore, GRID, cloud, parallel computers) Thread level parallelism execution issues: synchronization between thread data consistency concurrent access to shared resources communication between threads

Multiprocessing Amdahl’s law Limits of performance increase S - speedup of a parallel execution ts – time for sequential execution tp – time for parallel execution q fraction of a program which can be executed in parallel n – number of nodes/threads Examples: q=50%, n->∞ => S=2 q=75%, n->∞ => S=4 q=95%, n->∞ => S=20

Hyper-threading hyper-treading - parallel execution of instruction streams on a single CPU Idea: when a tread is stalled because of some hazard cases another thread can be executed Solution: two threads executed in parallel on the same pipelined CPU after every stage two buffers (registers) store the partial results of the two threads Speedup – approximately 30% The operating system will detect 2 logical CPUs !! IF ID Ex M Wb Single threaded Hyper threaded Thread 1 Thread 2 Thread

Multiprocessors Parallel execution of instruction streams on multiple CPUs Implementations: multi-core architectures – multiple CPUs in a single integrated circuit (IC) parallel computers – multiple CPUs on different ICs, but in the same computer infrastructure distributed computing facilities – multiple CPUs on different computers, connected through a network network of PCs GRID architectures – distributed computing resources for virtual organizations (VOs), manly for batch processing cloud architectures – computing resources (execution and storage) offered as a service; it can be hired dynamically combination of all above: multi-cores on parallel computers, building distributed computing facilities

Multi-core processors Why multi-core: Difficult to make single-core clock frequencies even higher; in the last 4-5 years the clock frequency growth saturated at 2.5-3 GHz power consumption and dissipation problems (figher frequency means more power) pipeline architectures (instruction level parallelism) reached their efficiency limits (around 20 pipeline stages) designing a very complex CPU (with multiple optimization schemes involved) requires coordination of very large designing teams many new applications are multithreaded (e.g. servers that solve multiple concurrent requests, agent systems, gaming, simulation, etc.)

Multi-core processors Issues (decision choices): same or different functionalities for CPUs (homogeneous v.s. heterogeneous CPUs) symmetric cores (SMP – Symmetric multi-core processor) – every core has the same structure and functionality asymmetric cores (ASMP) – there are coordination cores and (simpler) specialized cores the relation with the memory symmetric memory access - the SYMA non-uniform memory access – NUMA connection between cores common bus – parallel or network-based (see network-on-chip) crossbar – multiple connections controlled with a switch memory hierarchy (cache) – common memory zones

Multi-core processors architectural solutions Core Core Core Core Core Core L1 L1 L1 L1 L1 L1 L2 L2 Switch crossbar L2 L3 L3 Memory Memory Module 1 Memory Module 2 Symmetric multi-core with private L1 cache and shared L2 and memory Symmetric multi-core partially shared L2 and L3

Multi-core processors architectural solutions (cont.) Processor 1 Processor 2 Core (2x SMT) Core L1 L2 Local Store I/O Memory Module Core Core Core Core L1 L1 L1 L1 Ring network Switch Switch L2 L2 Memory Two processors with two cores and shared memory Heterogeneous multi-core with local and shared cache

Multi-core processors Shared cache high speed memory used by a number of cores (CPUs) advantages: efficient allocation of existing memory space one core may pre-fetch data for the other core sharing of common data no cache coherence problems less accesses to external memory drawbacks: conflict between cores when allocating space on the cache; one core may replace the other core’s data more complex control circuit and longer latency time because of the switching one core may lock the access to the other core

Multi-core processors Cache coherence of private memory How to keep the data consistent across caches? solutions: write through – every write is made also in the memory – not so efficient snooping and invalidation – cores are snooping the bus and invalidates their cache line if a write from another core affects its caches content (e.g. Pentium Pro’s P6 bus – snooping phase) core 1 core 2 core 3 core 4 Memory cache inconsistency Read write

Multi-core processors Symmetric v.s. asymmetric cores Symmetric architecture all cores are the same cores can perform any tasks; they are interchangeable Advantages: easy to build (simple replication), easy to program, to compile and to execute multithreaded programs examples: Intel, AMD - Dual and Quad core, Core2, SUN - UltraSparc T1 (Niagara) – 8 cores

Multi-core processors Symmetric v.s. asymmetric cores (cont.) Asymmetric (heterogeneous) architecture some cores have different functionalities: 1-2 master cores and many slave (simpler) cores 1 main core and multiple specialized cores (graphics, Fp, multimedia) compilations should take into consideration what functionalities can be performed by each core Advantages: can integrate much more simple cores examples: IBM – cell processor – used for Playstation 3

Multi-core processors Asymmetric (heterogeneous) architecture IBM cell architecture: 9 cores 1 PPE - power processor element coordination and data transfer 8 SPEs - Synergistic Processing Element specialized mathematical units applications: supercomputers playstations home cinema video cards

Multi-core processors Advantages of multi-core processors: Signals between different CPUs travel shorter distances, those signals degrade less. These higher quality signals allow more data to be sent in a given time period since individual signals can be shorter and do not need to be repeated as often Cache coherency circuitry can operate at a much higher clock rate than is possible if the signals have to travel off-chip. A dual-core processor uses slightly less power than two coupled single-core processors.

Multi-core processors Disadvantages of multi-core processors: Ability of multi-core processors to increase application performance depends on the use of multiple threads within applications. Most current video games will run faster on a 3 GHz single-core processor than on a 2GHz dual-core processor (of the same core architecture. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory bandwidth is not a problem, a 90% improvement can be expected.

Multi-core processors Thread affinity we can specify if a thread may be executed on any core or just on a specific core soft affinity: - controlled by the operating system an interrupted thread should continue on the same core hard affinity – flags associated to a thread that indicate on which core(s) may be executed useful for real-time and control applications – to reduce the load on a core on which critical threads are executed