Distributed Processors

Slides:

Advertisements

Similar presentations

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.

Lecture 6: Multicore Systems

CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.

Chapter 17 Parallel Processing.

 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.

1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.

Introduction to Parallel Processing Ch. 12, Pg

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Computer System Architectures Computer System Software

1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.

Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership: process includes a virtual address space to hold the process image (fig 3.16)

PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.

Operating System 4 THREADS, SMP AND MICROKERNELS.

Lecture 3: Computer Architectures

3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Processor Level Parallelism 1

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

COMP 740: Computer Architecture and Implementation

A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.

Multiprocessor System Distributed System

Overview Parallel Processing Pipelining

CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.

CLASSIFICATION OF PARALLEL COMPUTERS

18-447: Computer Architecture Lecture 30B: Multiprocessors

Microarchitecture.

Introduction to parallel programming

CS5102 High Performance Computer Systems Thread-Level Parallelism

buses, crossing switch, multistage network.

Parallel Processing - introduction

CS 147 – Parallel Processing

Operating System Structure

Flynn’s Classification Of Computer Architectures

Overview Parallel Processing Pipelining

Multi-Processing in High Performance Computer Architecture:

CMSC 611: Advanced Computer Architecture

Pipelining and Vector Processing

MIMD Multiple instruction, multiple data

Parallel and Multiprocessor Architectures – Shared Memory

Chapter 17 Parallel Processing

Symmetric Multiprocessing (SMP)

Outline Interconnection networks Processor arrays Multiprocessors

Multivector and SIMD Computers

Chapter 2: The Linux System Part 3

Lecture 24: Memory, VM, Multiproc

buses, crossing switch, multistage network.

Overview Parallel Processing Pipelining

AN INTRODUCTION ON PARALLEL PROCESSING

Lecture 4- Threads, SMP, and Microkernels

Introduction to Operating Systems

Operating System 4 THREADS, SMP AND MICROKERNELS

Introduction to Operating Systems

Multithreaded Programming

Introduction to Operating Systems

Chapter 4 Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Database System Architectures

Operating System Overview

Presentation transcript:

Distributed Processors A Scalar - is a single number, as opposed to a vector or matrix of numbers. e.g. "scalar multiplication" refers to multiplying one number by another in contrast to "matrix multiplication” A Vector – is a matrix A vector processor contains an arithmetic unit, that is capable of performing simultaneous computations on elements of an array or table It can operate on entire vectors with one instruction, e.g consider the following add instruction: c = a + b; In both scalar and vector machines this means add the contents of a to the contents of b and put the sum in C. However in a scalar machine the operands are numbers, but in vector processors the operands are vectors

SISD machines: These are the conventional systems that contain one CPU and hence can accommodate one instruction stream that is executed serially. Newer computers may have more than one CPU but each of these execute instruction streams that are unrelated. Therefore, such systems still should be regarded as (a couple of) SISD machines acting on different data spaces. SIMD machines: Such systems often have a large number of processing units, ranging from 1,024 to 16,384 that may execute the same instruction on different data. So, a single instruction manipulates many data items in parallel. Most SIMD systems are so called vector-processors. MISD machines: Theoretically in these type of machines multiple instructions should act on a single stream of data. To my knowledge no practical machine in this class has been constructed MIMD machines: These machines execute several instruction streams in parallel on different data. The difference with the multi-processor SISD machines is that the instructions and data are related because they represent different parts of the same task.

Shared memory systems - have multiple CPUs all of which share the same address space.

Distributed memory systems - to achieve better scalability, the memory can be distributed among multiple nodes, and connected to an interconnect The user must be aware of the location of the data in the local memories and will have to move or distribute these data explicitly when needed. If the system allows all CPUs to access the memory at all nodes using a hardware-based mechanism, as if the memory were local, it is called a distributed shared memory architecture. These systems can also be called non-uniform memory access (NUMA) architectures.

Multi-processing Granularity - In parallel computing, granularity means the amount of computation in relation to communication Fine-grained parallelism means individual tasks are relatively small in terms of code size and execution time. The data are transferred among processors frequently in amounts of one or a few memory words. Coarse-grained parallelism is the opposite: data are communicated infrequently, after larger amounts of computation. The finer the granularity, the greater the potential for parallelism and hence speed-up, but the greater the overheads of synchronization and communication In order to attain the best parallel performance, the best balance between load and communication overhead needs to be found. If the granularity is too fine, the performance can suffer from the increased communication overhead. On the other side, if the granularity is too coarse, the performance can suffer from load imbalance. A system with two CPUs cannot trade off executing alternate instructions within a program. Therefore, multiprocessor systems are really only effective in a multitasking system. This kind of parallelism is known as coarse-grained parallelism.

Multi-processor architectures A Symmetric Multi-Processor system (SMP) can be defined as a standalone computer system with the following characteristics: There are two or more similar processors of comparable capability. These processors share the same main memory and I/O facilities and are interconnected by a bus or other internal connection scheme, such that memory access time is approximately the same for each processor. All processors share access to I/O devices, either through the same channels or through different channels that provide paths to the same device. All processors can perform the same functions (hence the term symmetric ). The system is controlled by an integrated operating system that provides interaction between processors and their programs at the job, task, file, and data element levels.

Symmetric Multi-Processing System A symmetrical multi-processor (SMP) system contains multiple processors with common access to multiple memory modules which form a single address space. The access times to keep every byte of the memory from all the processors are the same. Such a system is said to have Uniform Memory Access (UMA).

Asymmetric Multi-Processing (AMP) Whereas a symmetric multiprocessor or SMP treats all of the processing elements in the system identically, an AMP system assigns certain tasks only to certain processors. In particular, only one processor may be responsible for fielding all of the interrupts in the system or perhaps even performing all of the I/O in the system Graphics cards, physics cards and cryptographic accelerators which are subordinate to a CPU in modern computers can be considered a form of asymmetric multiprocessing. AMP has some advantages, though. It’s the only approach that works when two separate OSs are in place. Also, resources can be dedicated to critical tasks, resulting in more deterministic performance. And it often has higher performance than SMP, because the cores spend less time handshaking with each other – however to my knowledge it is never employed in PC’s

Non-Uniform Memory Access (NUMA) In a NUMA system the processors and memory modules are divided into partitions. Each partition is called a node, where each node contains multiple processors and memory modules and all nodes are connected by a high speed interconnect network. The processors in each node share all the memory modules in the node and have the same access time to each byte of memory. So, each node is actually an SMP. http://www.systems.ethz.ch/education/past-courses/hs09/aos/lectures/

Multi-processor architectures Hyper-threading is a technology that enables a single physical processor to function as two virtual processors. A processor capable of hyperthreading has two sets of registers and allow the two hardware tasks to share the execution unit (e.. ALU and FPU) and other resources (e.g. cache and memory) of the processor. Each hardware task corresponds to a virtual processor. While one virtual processor waits, the other virtual processor takes over the resources and runs. This way, the resources in the physical processor are kept busy more of the time. Imagine a water fountain. When somebody gets to the fountain, he fill s up his waterbottle. If he forgets it, he runs off to get it but nobody can use the fountain until he gets back. Now imagine that that fountain has two lines. When somebody from line A fills up his waterbottle, the next in line from line B fills up his and vice versa. If somebody forgets his waterbottle, his line stops. However, the people from the other line can continue to use the fountain until he gets back. THAT's how a hyperthreaded CPU works - if one process stalls, there's another that can be substituted while the first gets everything in order

The way this is accomplished is by duplicating the registers. Hyperthreading The way this is accomplished is by duplicating the registers.

Single Threaded Symmetrical Multiprocessor

Super Threading Hyper Threading

End