Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

L.N. Bhuyan Adapted from Patterson’s slides
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Distributed Systems CS
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Distributed Processing, Client/Server, and Clusters
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
1: Operating Systems Overview
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Computer System Architectures Computer System Software
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Low-Power Wireless Sensor Networks
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Supporting Multi-Processors Bernard Wong February 17, 2003.
Distributed Shared Memory Based on Reference paper: Distributed Shared Memory, Concepts and Systems.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Lec 6 Chap. 13Multiprocessors
1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen
Outline Why this subject? What is High Performance Computing?
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
18-447: Computer Architecture Lecture 30B: Multiprocessors
Lecture 5 Approaches to Concurrency: The Multiprocessor
CS5102 High Performance Computer Systems Thread-Level Parallelism
Concurrent Data Structures for Near-Memory Computing
Parallel Programming By J. H. Wang May 2, 2017.
Department of Computer Science University of California, Santa Barbara
Lecture 1: Parallel Architecture Intro
Chapter 4 Multiprocessors
Database System Architectures
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Trends in Cluster Architecture Steve Lumetta David Culler University of California at Berkeley Computer Science Division

Lessons from the NOW Project how to build a system  uniprocessors and fast networks  parallel and sequential jobs simultaneously  no operating system changes questions for the future  “killer” applications?  requirements for hardware?  the next step?

Infrequently Cited Quotations Bob Lucky said (to our graduating class), “Technology is running away from us…that’s Moore’s Law.” Steve Lumetta says (to his key application vendor), “If all you can give me is Moore’s Law, you’re history!”

Applications of Parallelism enterprise computing  growing market  optimized parallel versions important applications  databases (DB2 on SP-2)  internet services (Inktomi and TranSend on NOW)  collaborative environments  others? hardware requirements  efficient inter-process communication  reasonable per-processor I/O bandwidth

Outline motivation clusters of SMP’s communication abstraction model of shared resources conclusions

network cloud memory interconnect SMP memory network cards  SMP memory SMP Hardware memory trends  larger, slower memory  affinity increasingly important SMP’s minimize penalties  lower latency  higher throughput

network cloud memory interconnect SMP memory network cards  SMP memory Cluster Software explicit control of locality  operating system  compiler/runtime  programmer high availability  multiple peer operating systems  dynamic resource partitions

An Important Component: Message-Passing within an address space  synchronize data transfer  ship control to hot cache  serialize access to complex data structure  optimize DSM protocols (SMP-Shasta) between address spaces  support DSM (Cashmere-2L, Shasta)  communicate between operating systems

send a message shared memory network communication layer poll for messages A Uniform Communication Interface hierarchical hardware single interface for message-passing  hides multi-protocol complexity  allows for optimization design issues  shared data layout  queue algorithm  polling strategy

concurrent message queue sender receiver Shared Memory Protocol Design one queue per receiver  less memory than 1-to-1 queues  longer queues reduce impact of overflow reduce coherence traffic (50-80 cycles each)  avoid false sharing  use cache-aligned data require atomic queue operations

Lock-Free Queue Algorithm index  Fetch&Increment (qˆ.tail) mod Q_LENGTH while TRUE if Compare&Swap (qˆ.packet[index].type, FREE, CLAIMED) return index; (back off exponentially and poll) head packets tail direction of advance

Advantages of the Lock-Free Algorithm very simple; tightly coupled to data structure versus simple spin lock:  slightly higher overhead  less vulnerable to contention effective for multiprogramming  avoids mutual exclusion  rarely blocks (except when queue is full)

send a message shared memory network communication layer poll for messages Polling Strategy poll costs differ by an order of magnitude simple polling adversely impacts fast protocol use adaptive polling strategy  monitor incoming traffic  recent history determines polling frequency

Send Overhead via Shared Memory Sun Enterprise 5000 server with 167MHz Ultrasparc processors  bus transactions: 32% of total time  more expensive on Enterprise  increase in future need control over coherence policy

Shared Resource Model processors alternate between two queues  private idle queue  shared communication queue communication queue  single server  server-sharing discipline processor characterization  utilization u (from 0 to 1)  duty cycle when P=1 2P communication queue 1 idle queues...

Communication Queue Scaling many small resourcesone large resource 2P communication queue 1 idle queues... 21N 2P communication queue 1 idle queues... N

Application Slowdown Metric three regimes  correlated: worst case  independent: speedup at low utilization  scheduled: maximum benefit correlated scheduled independent

The Effect of Resource Scaling

Conclusions: The Future of Clusters hardware  clusters of SMP’s (Clumps)  scalable I/O capability  cache coherence control software  dynamic resource partitions  focus on data affinity  efficient message-passing communication abstraction  uniform interface  lock-free algorithm  adaptive polling strategy

Trend: research era, introduction to industry, use by industry SMP’s: early 80’s, etc. Clusters: last 5 years have been culmination of research era Viewed over time, approaches to system design usually divide into three eras. The first is an era of research and prototypes; a few machines are produced, and a few may be sold, but no real market is created. Why does parallelism matter?