The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Agenda Introduction Message Passing Process Oriented Concurrency Paradigm Hardware Description Software Considerations Measurements Future Work Summary

Introduction How do we get a whole bunch of processors to work together on the same problem in a scalable way? Test bed developed at Caltech for what they hoped to be a VLSI implementation Programmer controls data sharing, not cache coherency mechanisms Techniques for certain problems that give close to linear speed-up

Message Passing Communication and synchronization primitives seen by programmer –Barrier –Blocking sends and receives –Broadcasts and node to node message passing –Explicit sharing of data through sending messages –Programmer decides when updates are necessary Hardware structure is memory/processor node –Separate consideration for memory vs. inter-process communication –Optimize each –Memory is closer to where it will be used

Message Passing Hyper-cube communications –Scales well O(n lg n) cost O(lg n) worst case message delivery –Simple routing Discrete, 2-valued, n-tuple Process address gives routing instructions –Clustering Can use “spheres” of nodes for separate problems

Process Oriented Abstraction from direct hardware targeting Processes mapped to nodes –Multiple processes interleaved in single nodes –Unique addresses –Unique message channels –Programmer not concerned w/ actual number of nodes and node addresses Kernel required on each node –Provides routing services –Provides process management services –Requires processing time

Process Oriented Caltech disallows process node switching –Prevents effective run-time load balancing Programmer responsibility –Allows node ID to be included w/ process ID Can take advantage of hyper-cube routing simplifications Issue: Interleaving may be bad in certain cases –Context switch for message passing

Concurrency Paradigm Programmer must explicitly deal with concurrency Different from other approaches where compiler or hardware is expected to find parallelization Requires a restructuring of single processor ides –Bubble sort becomes a linear solution –A lot of solutions need to be redesigned altogether

Concurrency Paradigm Techniques –Exploit outer loop unrolling Sparse/Predictable messaging Good for science and engineering problems –Regular loops –Predictable flow –SIMD—Same thing on a whole lotta data

Hardware Description 64-node hyper-cube –5 ft., 700 watts, $80,000 –Linear projection –Simulation results led to hyper-cube choice –Allowed for slow network links compared to CPU Speed Node –8086 processor w/ 8087 coprocessor Needed good floating-point operations Slowed from 8 MHz to 5 MHz for 8087 –128K RAM—Spend money on other things –8K ROM for initialization and POSTs

Hardware Description Developed prototype as test bed and resource raiser 1981-1982 for first prototype 2-cube Summer of 1983 to 6-cube First year: 560,000 node hours –2 hard errors –1 soft error/several days

Software Considerations Development and testing done on traditional machines Initialization had to deal with node checks in addition to RAM checks Extensions to C had to be developed to facilitate the machine’s use by other researchers Kernel must be developed –Deal with message passing constructs –Must manage requests from intermediate host (IH) –probe: Allows process access to message layer –spy: Allows IH to examine and modify kernel execution data

Measurements Speedup = T(1)/T(n) Efficiency = Speedup / N –1 is good –<= 1/N is bad Only really useful to measure scalability of an algorithm with problems requiring a lot more processes than nodes available

Measurements What affects efficiency? (Overhead) –Load balancing problems –Message start-up latency Big messages vs. small messages –Hop latency –Processor time used in message routing functions

Measurements Performance –Some apps achieved max of 3 MIPS in floating-point ops –Many other apps reached optimal speed-up compared to VAX11/780 with overheads of.025 -.5 Low message frequency?

Future Work Move routing functions to network device Experiment with hybrid shared memory approach Allow for dynamic load-balancing Experiment with more programmer control of process to node assignments Try different problem areas to expand message protocol Make interface more programmer friendly

Summary New programming paradigm required Offers lots of advantages in the scientific and engineering problem set May be interesting to apply to other domains Achieved what appears to be excellent scalability Good success in limited domain

Questions? Comments? Snide Remarks?

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Similar presentations

Presentation on theme: "The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

Similar presentations

Presentation on theme: "The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03."— Presentation transcript:

Similar presentations

About project

Feedback