Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.

Similar presentations


Presentation on theme: "Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division."— Presentation transcript:

1 Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division

2 Lessons from the NOW Project how to build a system  uniprocessors and fast networks  parallel and sequential jobs simultaneously  no operating system changes questions for the future  “killer” applications?  requirements for hardware?  the next step?

3 Infrequently Cited Quotations Bob Lucky said (to our graduating class), “Technology is running away from us…that’s Moore’s Law.” Steve Lumetta says (to his key application vendor), “If all you can give me is Moore’s Law, you’re history!”

4 Applications of Parallelism enterprise computing  growing market  optimized parallel versions important applications  databases (DB2 on SP-2)  internet services (Inktomi and TranSend on NOW)  collaborative environments  others? hardware requirements  efficient inter-process communication  reasonable per-processor I/O bandwidth

5 Outline motivation clusters of SMP’s communication abstraction model of shared resources conclusions

6 network cloud memory interconnect SMP memory network cards  SMP memory SMP Hardware memory trends  larger, slower memory  affinity increasingly important SMP’s minimize penalties  lower latency  higher throughput

7 network cloud memory interconnect SMP memory network cards  SMP memory Cluster Software explicit control of locality  operating system  compiler/runtime  programmer high availability  multiple peer operating systems  dynamic resource partitions

8 An Important Component: Message-Passing within an address space  synchronize data transfer  ship control to hot cache  serialize access to complex data structure  optimize DSM protocols (SMP-Shasta) between address spaces  support DSM (Cashmere-2L, Shasta)  communicate between operating systems

9 send a message shared memory network communication layer poll for messages A Uniform Communication Interface hierarchical hardware single interface for message-passing  hides multi-protocol complexity  allows for optimization design issues  shared data layout  queue algorithm  polling strategy

10 concurrent message queue sender receiver Shared Memory Protocol Design one queue per receiver  less memory than 1-to-1 queues  longer queues reduce impact of overflow reduce coherence traffic (50-80 cycles each)  avoid false sharing  use cache-aligned data require atomic queue operations

11 Lock-Free Queue Algorithm index  Fetch&Increment (qˆ.tail) mod Q_LENGTH while TRUE if Compare&Swap (qˆ.packet[index].type, FREE, CLAIMED) return index; (back off exponentially and poll) head packets tail direction of advance

12 Advantages of the Lock-Free Algorithm very simple; tightly coupled to data structure versus simple spin lock:  slightly higher overhead  less vulnerable to contention effective for multiprogramming  avoids mutual exclusion  rarely blocks (except when queue is full)

13 send a message shared memory network communication layer poll for messages Polling Strategy poll costs differ by an order of magnitude simple polling adversely impacts fast protocol use adaptive polling strategy  monitor incoming traffic  recent history determines polling frequency

14 Send Overhead via Shared Memory Sun Enterprise 5000 server with 167MHz Ultrasparc processors  bus transactions: 32% of total time  more expensive on Enterprise 10000  increase in future need control over coherence policy

15 Shared Resource Model processors alternate between two queues  private idle queue  shared communication queue communication queue  single server  server-sharing discipline processor characterization  utilization u (from 0 to 1)  duty cycle when P=1 2P communication queue 1 idle queues...

16 Communication Queue Scaling many small resourcesone large resource 2P communication queue 1 idle queues... 21N 2P communication queue 1 idle queues... N

17 Application Slowdown Metric three regimes  correlated: worst case  independent: speedup at low utilization  scheduled: maximum benefit correlated scheduled independent

18 The Effect of Resource Scaling

19 Conclusions: The Future of Clusters hardware  clusters of SMP’s (Clumps)  scalable I/O capability  cache coherence control software  dynamic resource partitions  focus on data affinity  efficient message-passing communication abstraction  uniform interface  lock-free algorithm  adaptive polling strategy

20 Trend: research era, introduction to industry, use by industry SMP’s: early 80’s, etc. Clusters: last 5 years have been culmination of research era Viewed over time, approaches to system design usually divide into three eras. The first is an era of research and prototypes; a few machines are produced, and a few may be sold, but no real market is created. Why does parallelism matter?


Download ppt "Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division."

Similar presentations


Ads by Google