Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.

Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001

Parallel Processing Parallelism in Uniprocessor Systems Parallelism in Uniprocessor Systems Organization of Multiprocessor Systems Organization of Multiprocessor Systems

Parallelism in Uniprocessor Systems A computer achieves parallelism when it performs two or more unrelated tasks simultaneouslyA computer achieves parallelism when it performs two or more unrelated tasks simultaneously

Uniprocessor Systems Uniprocessor system may incorporate parallelism using: an instruction pipelinean instruction pipeline a fixed or reconfigurable arithmetic pipelinea fixed or reconfigurable arithmetic pipeline I/O processorsI/O processors vector arithmetic unitsvector arithmetic units multiport memorymultiport memory

Uniprocessor Systems Instruction pipeline: By overlapping the fetching, decoding, and execution of instructionsBy overlapping the fetching, decoding, and execution of instructions Allows the CPU to execute one instruction per clock cycleAllows the CPU to execute one instruction per clock cycle

Uniprocessor Systems Reconfigurable Arithmetic Pipeline: Better suited for general purpose computingBetter suited for general purpose computing Each stage has a multiplexer at its inputEach stage has a multiplexer at its input The control unit of the CPU sets the selected data to configure the pipelineThe control unit of the CPU sets the selected data to configure the pipeline Problem: Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously.Problem: Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously.

Uniprocessor Systems Vectored Arithmetic Unit: Provides a solution to the reconfigurable arithmetic pipeline problemProvides a solution to the reconfigurable arithmetic pipeline problem Purpose: to perform different arithmetic operations in parallelPurpose: to perform different arithmetic operations in parallel

Uniprocessor Systems Vectored Arithmetic Unit (cont.): Contains multiple functionalContains multiple functionalunits - Some performs addition, subtraction, etc. subtraction, etc. Input and output switchesInput and output switches are needed to route the proper data to their proper destinations - Switches are set by the control unit control unit

Uniprocessor Systems Vectored Arithmetic Unit (cont.): How do we get all that data to the vector arithmetic unit? By transferring several data values simultaneously using: - Multiple buses - Very wide data buses

Uniprocessor Systems Improve performance: Allowing multiple, simultaneous memory accessAllowing multiple, simultaneous memory access - requires multiple address, data, and control buses (one set for each simultaneous memory access) (one set for each simultaneous memory access) - The memory chip has to be able to handle multiple transfers simultaneously transfers simultaneously

Uniprocessor Systems Multiport Memory: Has two sets of address, data, and control pins to allow simultaneous data transfers to occurHas two sets of address, data, and control pins to allow simultaneous data transfers to occur CPU and DMA controller can transfer data concurrentlyCPU and DMA controller can transfer data concurrently A system with more than one CPU could handle simultaneous requests from two different processorsA system with more than one CPU could handle simultaneous requests from two different processors

Uniprocessor Systems Multiport Memory (cont.): Can - Multiport memory can handle two requests to read data from the same location at the same time Cannot - Process two simultaneous requests to write data to the same memory location - Requests to read from and write to the same memory location simultaneously

Organization of Multiprocessor Systems Three different ways to organize/classify systems: Flynn’s Classification System Topologies MIMD System Architectures

Multiprocessor Systems Flynn’s Classification Flynn’s Classification: Based on the flow of instructions and data processingBased on the flow of instructions and data processing A computer is classified by:A computer is classified by: - whether it processes a single instruction at a time or multiple instructions simultaneously - whether it operates on one more multiple data sets

Multiprocessor Systems Flynn’s Classification Four Categories of Flynn’s Classification: SISDSingle instruction single dataSISDSingle instruction single data SIMDSingle instruction multiple dataSIMDSingle instruction multiple data MISDMultiple instruction single data **MISDMultiple instruction single data ** MIMDMultiple instruction multiple dataMIMDMultiple instruction multiple data ** The MISD classification is not practical to implement. In fact, no significant MISD computers have ever been build. It is included only for completeness.

Multiprocessor Systems Flynn’s Classification Single instruction single data (SISD): Consists of a single CPU executing individual instructions on individual data valuesConsists of a single CPU executing individual instructions on individual data values

Multiprocessor Systems Flynn’s Classification Single instruction multiple data (SIMD): Main Memory Control Unit Processor Memory Communications Network Executes a single instruction on multiple data values simultaneously using many processors Executes a single instruction on multiple data values simultaneously using many processors Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction This task is handled by a single control unit that sends the control signals to each processor. This task is handled by a single control unit that sends the control signals to each processor. Example: Array processor Example: Array processor

Multiprocessor Systems Flynn’s Classification Multiple instruction Multiple data (MIMD): Executes different instructions simultaneouslyExecutes different instructions simultaneously Each processor must include its own control unitEach processor must include its own control unit The processors can be assigned to parts of the same task or to completely separate tasksThe processors can be assigned to parts of the same task or to completely separate tasks Example: Multiprocessors, multicomputersExample: Multiprocessors, multicomputers

Multiprocessor Systems System Topologies System Topologies: The topology of a multiprocessor system refers to the pattern of connections between its processorsThe topology of a multiprocessor system refers to the pattern of connections between its processors Quantified by standard metrics:Quantified by standard metrics: DiameterThe maximum distance between two processors in the computer system DiameterThe maximum distance between two processors in the computer system BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case)

Multiprocessor Systems System Topologies Six Categories of System Topologies: Shared bus Ring Tree Mesh Hypercube Completely Connected

Multiprocessor Systems System Topologies Shared bus: The simplest topologyThe simplest topology Processors communicate with each other exclusively via this busProcessors communicate with each other exclusively via this bus Can handle only one data transmission at a timeCan handle only one data transmission at a time Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitryCan be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry Shared Bus Global Memory M P M P M P

Multiprocessor Systems System Topologies Ring: Uses direct dedicated connections between processorsUses direct dedicated connections between processors Allows all communication links to be active simultaneouslyAllows all communication links to be active simultaneously A piece of data may have to travel through several processors to reach its final destinationA piece of data may have to travel through several processors to reach its final destination All processors must have two communication linksAll processors must have two communication links P PP PP P

Multiprocessor Systems System Topologies Tree topology: Uses direct connections between processorsUses direct connections between processors Each processor has three connectionsEach processor has three connections Its primary advantage is its relatively low diameterIts primary advantage is its relatively low diameter Example: DADO ComputerExample: DADO Computer P PP PPP

Multiprocessor Systems System Topologies Mesh topology: Every processor connects to the processors above, below, left, and rightEvery processor connects to the processors above, below, left, and right Left to right and top to bottom wraparound connections may or may not be presentLeft to right and top to bottom wraparound connections may or may not be present PPP PPP PPP

Multiprocessor Systems System Topologies Hypercube: Multidimensional meshMultidimensional mesh Has n processors, each with log n connectionsHas n processors, each with log n connections

Multiprocessor Systems System Topologies Completely Connected: Every processor has n-1 connections, one to each of the other processors The complexity of the processors increases as the system grows Offers maximum communication capabilities

Multiprocessor Systems System Topologies TOPOLOGYDIAMETERBANDWIDTH BISECTION BANDWIDTH Sharedl 1 * l Ring  n / 2  n * l 2 * l Tree 2  lg n  (n – 1) * l 1 * l Mesh * 2  n 2n – 2  n 2   n / 2  * l Mesh **  n n n n 2n * l 2  n * l Hypercube lg n (n/2) * lg n * l (n/2) * l Comp. Con. 1 (n/2)*(n-1) * l (  n/2  *  n/2  )* l * Without wraparound ** With wraparound l = bandwidth of the bus n = number of processors

Multiprocessor Systems MIMD System Architecture MIMD System Architecture: The architecture of an MIMD system refers to its connections with respect to system memoryThe architecture of an MIMD system refers to its connections with respect to system memory MultiprocessorMultiprocessor MulticomputersMulticomputers

Multiprocessor Systems MIMD System Architecture Symmetric multiprocessor (SMP): A computer system that has two or more processor with comparable capabilitiesA computer system that has two or more processor with comparable capabilities Four different types:Four different types: - Uniform memory access (UMA) - Nonuniform memory access (NUMA) - Cache coherent NUMA (CC-NUMA) - Cache only memory access (COMA)

Multiprocessor Systems MIMD System Architecture Uniform memory access (UMA): Gives all CPUs equal (uniform) access to all shared memory locationsGives all CPUs equal (uniform) access to all shared memory locations Each processor may have its own cache memory, not directly accessible by the other processorsEach processor may have its own cache memory, not directly accessible by the other processors Processor 1 Processor 2 Processor n Communications Mechanism Shared Memory

Multiprocessor Systems MIMD System Architecture Nonuniform memory access (NUMA): Dos not allow uniform access to all shared memory locationsDos not allow uniform access to all shared memory locations It still allows all processors to access all shared memory locations, however, each processor can access the memory module closest to it faster than other modulesIt still allows all processors to access all shared memory locations, however, each processor can access the memory module closest to it faster than other modules Processor 1Processor 2 Processor n Communications Mechanism Memory 1Memory 2Memory n

Multiprocessor Systems MIMD System Architecture Cache Coherent NUMA (CC-NUMA): Similar to NUMA except each processor includes cache memorySimilar to NUMA except each processor includes cache memory The cache can buffer data from memory modules that are not local to the processor, which can reduce the access time of the memory transfersThe cache can buffer data from memory modules that are not local to the processor, which can reduce the access time of the memory transfers Creates a problem when two or more caches hold the same piece of dataCreates a problem when two or more caches hold the same piece of data A solution to this problem is Cache only memory access (COMA)A solution to this problem is Cache only memory access (COMA)

Multiprocessor Systems MIMD System Architecture Cache Only Memory Access (COMA): Each processor’s local memory is treated as a cacheEach processor’s local memory is treated as a cache When the processor requests data that is not in its cache (local memory), the system loads that data into local memory as part of the memory operationWhen the processor requests data that is not in its cache (local memory), the system loads that data into local memory as part of the memory operation

Multiprocessor Systems MIMD System Architecture Multicomputer: An MIMD machine in which all processors are not under the control of one operating systemAn MIMD machine in which all processors are not under the control of one operating system Each processor or group of processors is under the control of a different operating system, or a different instantiation of the same operating systemEach processor or group of processors is under the control of a different operating system, or a different instantiation of the same operating system Two different types:Two different types: - Network or cluster of workstations (NOW or COW) - Massively parallel processor (MPP)

Multiprocessor Systems MIMD System Architecture Network of workstation (NOW) or Cluster of workstation (COW): More than a group of workstations on a local area network (LAN)More than a group of workstations on a local area network (LAN) Have a master scheduler, which matches tasks and processors togetherHave a master scheduler, which matches tasks and processors together

Multiprocessor Systems MIMD System Architecture Massively Parallel Processor (MPP): Consists of many self-contained nodes, each having a processor, memory, and hardware for implementing internal communicationsConsists of many self-contained nodes, each having a processor, memory, and hardware for implementing internal communications The processors communicate with each other using shared memoryThe processors communicate with each other using shared memory Example: IBM’s Blue Gene ComputerExample: IBM’s Blue Gene Computer

Thank you! Any Questions???

Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.

Similar presentations

Presentation on theme: "Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.

Similar presentations

Presentation on theme: "Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001."— Presentation transcript:

Similar presentations

About project

Feedback