Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006.

Similar presentations


Presentation on theme: "Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006."— Presentation transcript:

1 Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006

2 Anshul Kumar, CSE IITD Context switching Delays and poor resource utilization due to - –Data/control hazards –cache misses –waiting for some event Solution – –context switch to another thread Context switch mechanism – –operating system - slow –hardware - fast

3 Anshul Kumar, CSE IITD Multithreaded architecture Hardware context switching Models –control flow or hybrid (control flow, data flow) Granularity –fine grain or coarse grain Memory organization –shared?, distributed?, cache coherent? No. of threads –small, medium, large

4 ILP and Multithreading ILP Coarse MT Fine MT SMT Hennessy and Patterson

5 Anshul Kumar, CSE IITD Chip level multithreading Executing instructions from multiple threads within one processor chip at the same time. Multithreading: Interleaved issue of multiple instructions from different threads Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle. Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently Any combination of multithreading/SMT/CMP Wikipedia

6 Anshul Kumar, CSE IITD Historical Examples MachineGranu-ProcsThreads/MemoryYear larity proc HEP fromfinemax 168 activeshared1978 Denelcor64 maxcentralized Terafinemax 256128distributed1990 shared Alewifecoarsemax 5121 activeCC1990 (MIT)sparcle3 loaded

7 Anshul Kumar, CSE IITD Modern examples Pentium 4Hyperthreading MIPS MT 8 cores with 4 threads each IBM Power 5 dual core, 2 threads each Ultrasparc T1 fine grained multithreading

8 Anshul Kumar, CSE IITD HEPHEP FU1FU2FUn Operand fetch Matching unit Registers Program memory Increment control PSW queue To/from data memory SFU Control loop8 stage pipeline scheduler function unit

9 Anshul Kumar, CSE IITD Control Flow & Data Flow models Control Flow (von Neumann) –control flows through a sequence of instructions, branches can alter the flow –instructions get data from or put data in memory –explicit parallelism through control operators – fork/join Data Flow –instructions are triggered by availability of data –data flows from instruction to instruction –explicit parallelism

10 Anshul Kumar, CSE IITD Dataflow Model -+ * AB1 A-BB+1 R=(A-B)*(B+1)

11 Anshul Kumar, CSE IITD Dataflow Program A B A-B B+1 R=(A-B)*(B+1) - L4/1 + 1 L4/2 * L6/1 - L2/2 L3/1 B L1: L2: L3: L4: Compute B

12 Anshul Kumar, CSE IITD Static Dataflow Architecture FU1FU2FUn Fetch unit Update unit Activity Store Instruction queue to/from other PEs

13 Anshul Kumar, CSE IITD Tagged-token dataflow architecture FU1FU2FUn Fetch unit Form token unit Instruction/ data memory Token queue to/from other PEs Matching unit Matching store

14 Anshul Kumar, CSE IITD UMA Examples Earlier approach : Large number of processors (e.g. Denelcor HEP, NYU Ultracomputer) Now realized : Good only for small number of processors (e.g. Encore Multimax - 1980’s, SGI Power Challenge - 1990’s)

15 Anshul Kumar, CSE IITD SGI Power Challenge 18 MIPS R 8000 16 GB RAM, 8-way interleaved 4 power channel-2, each 320 MB/s (I/O bus) Power path-2 : split transaction shared bus (256 bit data, 40 bit address) Snoopy cache coherence protocol

16 Anshul Kumar, CSE IITD NUMA Examples BBN TC2000 IBM RP3 Hector Cray T3D

17 Anshul Kumar, CSE IITD HectorHector Hierarchical Structure global ring local rings stations Proc module (P+C+M) I/O module

18 Anshul Kumar, CSE IITD HectorHector local ring global ring local ring station Proc module Proc module Proc module I/O module Station controller Station bus Station

19 Anshul Kumar, CSE IITD Cray T3D Alpha 21064 ProcCray Y-MP host upto 128 GB memory 4x4x4 3D torus - config upto 8x8x8 2 PEs in each node

20 Anshul Kumar, CSE IITD CC-NUMA examples MachineNodesMemCacheNet Wisconsinsingle procper col bussnoopybus grid Multicube Aquariussingle procper nodesnoopy+bus grid Multimultidirectory Stanfordclusterper clustersnoopy+pair of Dash4 R3000+directorymeshes FPU on bus Stanfordsingle procper nodedirectory2D FlashT5+magic chipmesh Convexhyper nodeperSCIX bar Exemplar8 PA-RISChyper node (hyper node) multi rings Magic chip : memory + I/O + network controller

21 Anshul Kumar, CSE IITD COMA examples DDM (Data Diffusion Machine) –single bus (split transaction) –can be made hierarchical KSR 1 –hierarchical rings –distributed directory is a matrix : rows for pages, columns for caches

22 Anshul Kumar, CSE IITD Distr Mem Arch Examples MachineComp.Comm.Vec.SwitchTopology procprocproc nCUBE2customcustomhyper cube iPSC2i386yesyeshyper cube Inteli860i860custom2D mesh Paragon Genesisi870i870custom2 level X bar Mannai860i86016x16 X bar hierarch. ParsytecP.PC601T805C0043D mesh Transtechi860T805C004variable Paramid IBM SP2Power2i860customfat tree MeikoSPARCcustomFujitsucustomfat tree C32 ParsysT900T900C104hierarch sw SN9800

23 Anshul Kumar, CSE IITD ReferencesReferences D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.


Download ppt "Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow architectures Multiprocessor examples 1 st May, 2006."

Similar presentations


Ads by Google