Multi-Processing in High Performance Computer Architecture:

Multi-Processing in High Performance Computer Architecture:

What is Multiprocessing?
Enables several programs to run concurrently Coordinated processing of Programs by more than one processor Use of 2 or more CPUs within a single computer system Ability of a system to support more than one processor and to allocate tasks between them What is Multiprocessing?

Idealism (Target for Processor Performance):

Memory Hierarchies:

Memory in Modern Processor (L1 Cache?):

Flynn’s Taxonomy for Parallel Machines:
How many Instruction and Data Streams ? Instruction Stream Data Stream Uni-Processor SISD 1 Single Instruction Single Data Vector/MMX SIMD >1 Single Instruction Multiple Data Streaming Processor (Camera) MISD Multiple Instruction Single Data Multi-Processor MIMD Multiple Instruction Multiple Data (Multi-Core)

Why Not Uni-Processors:
Making a wider Processor can efficiently run parallel programs but not programs that have dependencies In Uni-Processor, the instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations The instructions b = e + f; a = b + c, might not run in parallel due to data dependencies. There are several more complex data dependencies. Parallel part : Fast, One at a time(Stalls) : Slow

Why Multi-Processors? Uni-Processor  Already 4 Wide
Diminishing returns from getting wider Frequency (high)  Voltage (high)  Power ( ^3)  ½ CV^2f Uni-Processor  Already 4 Wide 2x Transistors every 2 years 2x Cores every 2 years 2x Performance every 2 years (Assuming we use all cores) But Moore’s Law Continues Why Multi-Processors?

Multi-Processor needs Parallel Programs (Years to develop):
Sequential (single-thread) code lot easier to develop 01 Debugging Parallel Code is much more difficult 02 Performance Scaling is much harder to achieve 03

Types of Multiprocessors, Unified Memory Access (UMA)
Centralized Shared Memory, distance from Memory to Core is approx. the same Replicate Cores, Caches to build a Symmetric Multi – Processor(SMP)

Issues in Centralized Main Memory:
Memory Size  Large  Slow Memory Bandwidth  cache miss from all Cores  serially ques multiple requests to Main Memory  causing serious lag Works well for smaller machines up to 16 cores

Distributed (Multicomputer) Memory System(NUMA):

Distributed Memory : Each Core has it’s own local memory and cache forming a single core system A network interface card, connected to an interconnection network Cache miss goes directly to the local processor’s memory , it only accesses other processor’s memory through the network message passing. To communicate, data is sent EXPLICITLY to the core. Programmer is forced to be aware of communication between cores and try to minimize it. Think of it as a set of machines communicating over a network.

Shared Memory vs Message Passing (Hardware vs Software):

Performance Metrics (Message Passing vs Shared Memory):
Communication Programmer Automatic Data Distribution Manual Hardware Support Simple Extensive Programming: Correctness Difficult(Deadlocks) Less Difficult Performance Difficult Very Difficult

Multithreading as Shared Memory Hardware:

Analyzing Multithreading Performance:

Summary: Multithreaded Categories

Multi-Processing in High Performance Computer Architecture:

Similar presentations

Presentation on theme: "Multi-Processing in High Performance Computer Architecture:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-Processing in High Performance Computer Architecture:

Similar presentations

Presentation on theme: "Multi-Processing in High Performance Computer Architecture:"— Presentation transcript:

Similar presentations

About project

Feedback