Distributed Processors

Distributed Processors
A Scalar - is a single number, as opposed to a vector or matrix of numbers. e.g. "scalar multiplication" refers to multiplying one number by another in contrast to "matrix multiplication” A Vector – is a matrix A vector processor contains an arithmetic unit, that is capable of performing simultaneous computations on elements of an array or table It can operate on entire vectors with one instruction, e.g consider the following add instruction: c = a + b; In both scalar and vector machines this means add the contents of a to the contents of b and put the sum in C. However in a scalar machine the operands are numbers, but in vector processors the operands are vectors

SISD machines: These are the conventional systems that contain one CPU and hence can accommodate one instruction stream that is executed serially. Newer computers may have more than one CPU but each of these execute instruction streams that are unrelated. Therefore, such systems still should be regarded as (a couple of) SISD machines acting on different data spaces. SIMD machines: Such systems often have a large number of processing units, ranging from 1,024 to 16,384 that may execute the same instruction on different data. So, a single instruction manipulates many data items in parallel. Most SIMD systems are so called vector-processors. MISD machines: Theoretically in these type of machines multiple instructions should act on a single stream of data. To my knowledge no practical machine in this class has been constructed MIMD machines: These machines execute several instruction streams in parallel on different data. The difference with the multi-processor SISD machines is that the instructions and data are related because they represent different parts of the same task.

Shared memory systems - have multiple CPUs all of which share the same address space.

Distributed memory systems - to achieve better scalability, the memory can be distributed among multiple nodes, and connected to an interconnect The user must be aware of the location of the data in the local memories and will have to move or distribute these data explicitly when needed. If the system allows all CPUs to access the memory at all nodes using a hardware-based mechanism, as if the memory were local, it is called a distributed shared memory architecture. These systems can also be called non-uniform memory access (NUMA) architectures.

Multi-processing Granularity - In parallel computing, granularity means the amount of computation in relation to communication Fine-grained parallelism means individual tasks are relatively small in terms of code size and execution time. The data are transferred among processors frequently in amounts of one or a few memory words. Coarse-grained parallelism is the opposite: data are communicated infrequently, after larger amounts of computation. The finer the granularity, the greater the potential for parallelism and hence speed-up, but the greater the overheads of synchronization and communication In order to attain the best parallel performance, the best balance between load and communication overhead needs to be found. If the granularity is too fine, the performance can suffer from the increased communication overhead. On the other side, if the granularity is too coarse, the performance can suffer from load imbalance. A system with two CPUs cannot trade off executing alternate instructions within a program. Therefore, multiprocessor systems are really only effective in a multitasking system. This kind of parallelism is known as coarse-grained parallelism.

Multi-processor architectures
A Symmetric Multi-Processor system (SMP) can be defined as a standalone computer system with the following characteristics: There are two or more similar processors of comparable capability. These processors share the same main memory and I/O facilities and are interconnected by a bus or other internal connection scheme, such that memory access time is approximately the same for each processor. All processors share access to I/O devices, either through the same channels or through different channels that provide paths to the same device. All processors can perform the same functions (hence the term symmetric ). The system is controlled by an integrated operating system that provides interaction between processors and their programs at the job, task, file, and data element levels.

Symmetric Multi-Processing System
A symmetrical multi-processor (SMP) system contains multiple processors with common access to multiple memory modules which form a single address space. The access times to keep every byte of the memory from all the processors are the same. Such a system is said to have Uniform Memory Access (UMA).

Asymmetric Multi-Processing (AMP)
Whereas a symmetric multiprocessor or SMP treats all of the processing elements in the system identically, an AMP system assigns certain tasks only to certain processors. In particular, only one processor may be responsible for fielding all of the interrupts in the system or perhaps even performing all of the I/O in the system Graphics cards, physics cards and cryptographic accelerators which are subordinate to a CPU in modern computers can be considered a form of asymmetric multiprocessing. AMP has some advantages, though. It’s the only approach that works when two separate OSs are in place. Also, resources can be dedicated to critical tasks, resulting in more deterministic performance. And it often has higher performance than SMP, because the cores spend less time handshaking with each other – however to my knowledge it is never employed in PC’s

Non-Uniform Memory Access (NUMA)
In a NUMA system the processors and memory modules are divided into partitions. Each partition is called a node, where each node contains multiple processors and memory modules and all nodes are connected by a high speed interconnect network. The processors in each node share all the memory modules in the node and have the same access time to each byte of memory. So, each node is actually an SMP.

Multi-processor architectures
Hyper-threading is a technology that enables a single physical processor to function as two virtual processors. A processor capable of hyperthreading has two sets of registers and allow the two hardware tasks to share the execution unit (e.. ALU and FPU) and other resources (e.g. cache and memory) of the processor. Each hardware task corresponds to a virtual processor. While one virtual processor waits, the other virtual processor takes over the resources and runs. This way, the resources in the physical processor are kept busy more of the time. Imagine a water fountain. When somebody gets to the fountain, he fill s up his waterbottle. If he forgets it, he runs off to get it but nobody can use the fountain until he gets back. Now imagine that that fountain has two lines. When somebody from line A fills up his waterbottle, the next in line from line B fills up his and vice versa. If somebody forgets his waterbottle, his line stops. However, the people from the other line can continue to use the fountain until he gets back. THAT's how a hyperthreaded CPU works - if one process stalls, there's another that can be substituted while the first gets everything in order

The way this is accomplished is by duplicating the registers.
Hyperthreading The way this is accomplished is by duplicating the registers.

Single Threaded Symmetrical Multiprocessor

Super Threading Hyper Threading

Distributed Processors

Similar presentations

Presentation on theme: "Distributed Processors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Processors

Similar presentations

Presentation on theme: "Distributed Processors"— Presentation transcript:

Similar presentations

About project

Feedback