Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Similar presentations


Presentation on theme: "The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03."— Presentation transcript:

1 The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

2 Agenda Introduction Message Passing Process Oriented Concurrency Paradigm Hardware Description Software Considerations Measurements Future Work Summary

3 Introduction How do we get a whole bunch of processors to work together on the same problem in a scalable way? Test bed developed at Caltech for what they hoped to be a VLSI implementation Programmer controls data sharing, not cache coherency mechanisms Techniques for certain problems that give close to linear speed-up

4 Message Passing Communication and synchronization primitives seen by programmer –Barrier –Blocking sends and receives –Broadcasts and node to node message passing –Explicit sharing of data through sending messages –Programmer decides when updates are necessary Hardware structure is memory/processor node –Separate consideration for memory vs. inter-process communication –Optimize each –Memory is closer to where it will be used

5 Message Passing Hyper-cube communications –Scales well O(n lg n) cost O(lg n) worst case message delivery –Simple routing Discrete, 2-valued, n-tuple Process address gives routing instructions –Clustering Can use “spheres” of nodes for separate problems

6 Process Oriented Abstraction from direct hardware targeting Processes mapped to nodes –Multiple processes interleaved in single nodes –Unique addresses –Unique message channels –Programmer not concerned w/ actual number of nodes and node addresses Kernel required on each node –Provides routing services –Provides process management services –Requires processing time

7 Process Oriented Caltech disallows process node switching –Prevents effective run-time load balancing Programmer responsibility –Allows node ID to be included w/ process ID Can take advantage of hyper-cube routing simplifications Issue: Interleaving may be bad in certain cases –Context switch for message passing

8 Concurrency Paradigm Programmer must explicitly deal with concurrency Different from other approaches where compiler or hardware is expected to find parallelization Requires a restructuring of single processor ides –Bubble sort becomes a linear solution –A lot of solutions need to be redesigned altogether

9 Concurrency Paradigm Techniques –Exploit outer loop unrolling Sparse/Predictable messaging Good for science and engineering problems –Regular loops –Predictable flow –SIMD—Same thing on a whole lotta data

10 Hardware Description 64-node hyper-cube –5 ft., 700 watts, $80,000 –Linear projection –Simulation results led to hyper-cube choice –Allowed for slow network links compared to CPU Speed Node –8086 processor w/ 8087 coprocessor Needed good floating-point operations Slowed from 8 MHz to 5 MHz for 8087 –128K RAM—Spend money on other things –8K ROM for initialization and POSTs

11 Hardware Description Developed prototype as test bed and resource raiser 1981-1982 for first prototype 2-cube Summer of 1983 to 6-cube First year: 560,000 node hours –2 hard errors –1 soft error/several days

12 Software Considerations Development and testing done on traditional machines Initialization had to deal with node checks in addition to RAM checks Extensions to C had to be developed to facilitate the machine’s use by other researchers Kernel must be developed –Deal with message passing constructs –Must manage requests from intermediate host (IH) –probe: Allows process access to message layer –spy: Allows IH to examine and modify kernel execution data

13 Measurements Speedup = T(1)/T(n) Efficiency = Speedup / N –1 is good –<= 1/N is bad Only really useful to measure scalability of an algorithm with problems requiring a lot more processes than nodes available

14 Measurements What affects efficiency? (Overhead) –Load balancing problems –Message start-up latency Big messages vs. small messages –Hop latency –Processor time used in message routing functions

15 Measurements Performance –Some apps achieved max of 3 MIPS in floating-point ops –Many other apps reached optimal speed-up compared to VAX11/780 with overheads of.025 -.5 Low message frequency?

16 Future Work Move routing functions to network device Experiment with hybrid shared memory approach Allow for dynamic load-balancing Experiment with more programmer control of process to node assignments Try different problem areas to expand message protocol Make interface more programmer friendly

17 Summary New programming paradigm required Offers lots of advantages in the scientific and engineering problem set May be interesting to apply to other domains Achieved what appears to be excellent scalability Good success in limited domain

18 Questions? Comments? Snide Remarks?


Download ppt "The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03."

Similar presentations


Ads by Google