An Overview of MIMD Architectures

An Overview of MIMD Architectures
4/15/2019 \course\eleg652-04F\Topic1b.ppt

Generic MIMD Architecture
A generic modern multiprocessor Node: processor(s), memory system, plus communication assist Network interface and communication controller Scalable network 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Classification Shared memory model vs. distributed memory model
4/15/2019 \course\eleg652-04F\Topic1b.ppt

Distributed Memory MIMD Machines (Multicomputers, MPPs, clusters, etc
Message passing programming models Interconnect networks Generations/history: : COSMIC CUBE iPSC/I, II software routing : mesh-connected (hardware routing) Intel paragon : CM-5, IBM-SP : clusters 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Concept of Message-Passing
Pr ocess P Q Addr ess Y X Send X, Q, t Receive , t Match Local pr addr ess space Send specifies buffer to be transmitted and receiving process Recv specifies sending process and application storage to receive into Memory to memory copy, but need to name processes In simplest form, the send/recv match achieves pairwise synch event 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Evolution of Message-Passing Machines
Early machines: FIFO on each link Hw close to programming model enabling non-blocking ops Buffered by system at destination until recv Diminishing role of topology Store&forward routing: topology important Introduction of pipelined routing made it less so Cost is in node-network interface Simplifies programming 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Example: IBM SP-2 Made out of essentially complete RS6000 Network interface integrated in I/O bus 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Example Intel Paragon 4/15/2019 \course\eleg652-04F\Topic1b.ppt

The MANNA Multiprocessor Testbed
cluster Crossbar- Hierarchies Cluster Node Node Node i860XP Node CP Network Interface I/O 32 Mbyte Memory 8 Node Node Crossbar 4 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Shared-Memory Multiprocessors
Uniform-memory-access model (UMA) Non-uniform-memory-access model (NUMA) without caches (BBN, cedar, Sequent) COMA (Kendall Square KSR-1, DDM) CC-NUMA (DASH) Symmetric vs. Asymmetric MPs Symmetric MP (SMPs) Asymmetric MP (some master some slave) 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Shared Address Space Model (e.g. pthreads)
Process: virtual address space plus one or more threads of control Portions of address spaces of processes are shared Writes to shared address visible to other threads Natural extension of uniprocessors model: conventional memory operations for comm.; special atomic operations for synchronization S t o r e P 1 2 n L a d p i v Virtual address spaces for a collection of processes communicating via shared addresses Machine physical address space Shared portion of address space Private portion Common physical addresses 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Shared Address Space Architectures
Any processor can directly reference any memory location (comm. Implicit) Convenient: Location transparency Similar programming model to time-sharing on uniprocessors Popularly known as shared memory machines or model Ambiguous: memory may be physically distributed among processors 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Shared-Memory Parallel Computers (late 90’s –early 2000’s)
SMPs (Intel-Quad, SUN SMPs) Supercomputers Cray T3E Convex 2000 SGI Origin/Onyx Tera Computers 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Example: Intel Pentium Pro Quad
All coherence and multiprocessing glue in processor module Highly integrated, targeted at high volume 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Example: SUN Enterprise
16 cards of either type: processors + memory, or I/O All memory accessed over bus, so symmetric Higher bandwidth, higher latency bus 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Scaling Up interconnect: cost (crossbar) or bandwidth (bus)
“Dance hall” Distributed memory interconnect: cost (crossbar) or bandwidth (bus) Dance-hall: bandwidth still scalable, but lower cost Distributed memory or non-uniform memory access (NUMA) Caching shared (particularly nonlocal) data? 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Example: Cray T3E Scale up to 1024 processors, 480MB/s links
Memory controller generates comm. request for nonlocal references No hardware mechanism for coherence (SGI Origin etc. provide this) 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Multithreaded Shared-Memory MIMD
“time sharing” one instruction processing unit in a pipelined fashion by all instruction streams 4/15/2019 \course\eleg652-04F\Topic1b.ppt

. . . . . . . . The Denelcor HEP PEM PEM 15 16 Packet switch network
2 PEM 16 Packet switch network DMM 1 DMM 2 . . . . DMM 127 DMM 128 PEM ST IF DF EX The Denelcor HEP INC PSW 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Denelcor HEP Many inst. streams single P-unit
16 PEM DMM : 64 bit/DMM Packet-switching network I-stream creation is under program control 50 I-streams Programmability : SISAL, Fortran = 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Tera MTA (1990) A shared memory LIW multiprocessor
128 fine threads have 32 registers each to tolerate FU, synchronization and memory latency. Explicit-dependence look ahead increases single-thread concurrency. Synchronization uses full/empty bits. 4/15/2019 \course\eleg652-04F\Topic1b.ppt

CM-5 Scalable Massively Parallel Supercomputer for 1990’s
1012 million floating-point operations per second (Tera-Flops) 64,000 powerful RISC microprocessors working together Scalable : performance grows transparently Universal : support a vast variety of application domains Highly reliable : sustained performance for large jobs requiring weeks/months to run. 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Future Trend of MIMD Computers
Program execution models : beyond the SPMD model Hybrid architecture: provide both shared-memory and message-passing Efficient mechanism for latency AND bw management –called the “memory-wall” problem 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Shared Memory Architecture Examples (2000 – now)
Sun’s Wildfire Architecture (Henn&Patt, section 6.11, page 622) Intel Xeon Multithreaded Architecture SGI Onyx-3000 IBM p690 Others 4/15/2019 \course\eleg652-04F\Topic1b.ppt

SUN FIRE 15K Expander Board Shared Memory p p p p p p p p I/O Boards 4 CPU per board: 900Mhz Ultra SPARC with 32KB I-cache and 64KB D-cache 32 GB memory per board Crossbar switch: 43 GB/s bandwidth 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Intel Xeon MP based server
Xeon Proc Memory Control Hub I/O PCI-x Bridge 1.8Ghz Xeon with 512k L2 cache 4 processor share a common bus of 6.4GB/s bandwidth Memory share a common bus of 4.3GB/s bandwidth Memory accessed through a memory control hub 4/15/2019 \course\eleg652-04F\Topic1b.ppt

IBM P690 I 1Ghz cpu 1Ghz cpu I D D Shared L2 Cache L3 controller Distributed switch L3 Cache Proc local bus I/O bus Memory Each POWER4 chip has two 1Ghz processor core, shared 1.5MB L2, directed access 32MB/chip L3, chip to chip communication logic Each SMP building block has 4 POWER4 chips The base p690 has up to 4 SMP building block 4/15/2019 \course\eleg652-04F\Topic1b.ppt

SGI Onyx 3800 R-Brick P $ shared memory Each node is called a C-Brick with 2-4 processor of 600Mhz R-Brick is a 8 by 8 cross-bar switch of 3.2GB/s bandwidth, 4 for C-Brick 4 for other R-Bricks Each C-brick has up to 8 GB of local memory that can be accessed by all processor in the way of NUMAlink interconnect 4/15/2019 \course\eleg652-04F\Topic1b.ppt

Recent High-End MIMD Parallel Architecture Projects
ASCI Projects (USA) ASCI Blue ASCI Red ASCI Blue Mountains HTMT Project (USA) The Earth Simulator (Japan) HPCS architectures (USA) 4/15/2019 \course\eleg652-04F\Topic1b.ppt

An Overview of MIMD Architectures

Similar presentations

Presentation on theme: "An Overview of MIMD Architectures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Overview of MIMD Architectures

Similar presentations

Presentation on theme: "An Overview of MIMD Architectures"— Presentation transcript:

Similar presentations

About project

Feedback