Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992.

Similar presentations


Presentation on theme: "Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992."— Presentation transcript:

1 Presented by: Quinn Gaumer CPS 221

2  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992

3  With 16,384 processors the interconnect plays a large role  3 Types of Networks ◦ Data ◦ Control ◦ Diagnostic

4  Easily Attainable High Performance  Scaling  Data Parallel Programming  High Reliability and Availability  Space/Time Shared  Fast Time to Market  Modular

5 Include  Control Processor  Processing Nodes  Slices of Data and Control Networks ◦ Privileged vs. Non-Privileged  Program Isolation  Time Sharing

6  Provide Simple View of Network to Processors  Sharing and Fault Tolerance  Decouple Network/Processor by Providing Contract ◦ Software -> ISA -> Hardware

7  “The data network promises to eventually accept and deliver all messages injected into the network by the processors as long as the processors promise to eventually eject all messages from the network when they are delivered to the processors. ”

8  Collection of Memory Mapped FIFOs ◦ Outgoing/Ingoing  Restricted Operations ◦ Implemented with protected pages  Physical/Relative(Virtual) Address ◦ Programs use only relative addresses  Network Independent of User ◦ Delivery guaranteed by network not processing node ◦ Requires network diagnostics

9  Fat Tree Structure ◦ Closer to the root, thicker the tree ◦ Ensures no bottlenecks at root  User Partitions and I/O are Sub-trees ◦ Guarantees network independence ◦ Messages in partition stay within partition  Many Optimal Node to Node Paths ◦ Choose randomly among open links

10  Data can be only 1-5 Words  Wormhole Routing  CRC Checking done at every Link ◦ Additional !CRC sent when error first found  Primary Errors allow Diagnostic Network to Determine location

11  Message Counters at every Link  Kirchoff’s Law to Determine Missing Messages  What to do with a Bad Chip or Link? ◦ Route Messages Away from Failure ◦ Map Out Nearby Processors ◦ Which is better?  Both.

12  Solution: Virtual Channels ◦ One channel for request and response ◦ 4 channels per chip (Incoming and Outgoing)  Deadlock still possible! ◦ User sends but never attempts to receive messages ◦ Higher level languages to implement communication protocol

13  Objectives ◦ Clear all messages for new user ◦ Allow all messages in transit to eventually finish  “All Fall Down” Method ◦ Evenly misroute all messages in transit to nodes ◦ Message saved at node ◦ Resent when swapped in

14  Control Processor broadcasts program ◦ Not instructions(SIMD)  Each Processor runs program on data set  Inter-Processor Communication ◦ Hardware Barriers allow for processes to communicate without shared semaphores

15  Program smaller than instructions ◦ Easier to deliver  Local fetch allows commodity processors ◦ Fast new RISC processors, less R & D.  Control system useful for other problems  Execution of generic MIMD code ◦ Message passing

16  Broadcasting ◦ User/Supervisor ◦ Interrupt ◦ Utility  Combining ◦ Reduction ◦ Forward/Backward Scan ◦ Router Done  Global Operations ◦ Synchronous/Asynchronous OR

17  Binary Tree  Four Types of Packets ◦ Single Source : Broadcasting ◦ Multiple Source: Combining ◦ Idle: Filler ◦ Abstain: Allow control node to skip waiting  Collisions on Network ◦ Multiple/Multiple: Buffering based on arrival time ◦ Multiple/Single: Single Source Packets Prioritized ◦ Single/Single: Error

18  Control Processor for each Partition ◦ Executes scalar code while processing nodes execute parallel code  Connect any Control Processor to any Partition ◦ Problems can occur in control networks too ◦ Diagnostics may show part of control network must be mapped out

19  Binary Network ◦ Pods(physical subsystem) are leaves  JTAG ◦ Designed for Multichip…but serial  Do JTAG for each Pod  Combine Responses with OR/AND


Download ppt "Presented by: Quinn Gaumer CPS 221.  16,384 Processing Nodes (32 MHz)  30 m x 30 m  Teraflop  1992."

Similar presentations


Ads by Google