Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
Intel Multi-Core Technology. New Energy Efficiency by Parallel Processing – Multi cores in a single package – Second generation high k + metal gate 32nm.
Optimization on Kepler Zehuan Wang
COS 461 Fall 1997 Workstation Clusters u replace big mainframe machines with a group of small cheap machines u get performance of big machines on the cost-curve.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Memory Design Example. Selecting Memory Chip Selecting SRAM Memory Chip.
Operating System Support Focus on Architecture
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Introduction What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Case study 2 Android – Mobile OS.
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development.
Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.
UC Berkeley 1 The Datacenter is the Computer David Patterson Director, RAD Lab January, 2007.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
Multi-stack System Software Jack Lange Assistant Professor University of Pittsburgh.
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Beowulf – Cluster Nodes & Networking Hardware Garrison Vaughan.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Partitioned Multistack Evironments for Exascale Systems Jack Lange Assistant Professor University of Pittsburgh.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
BCS361: Computer Architecture I/O Devices. 2 Input/Output CPU Cache Bus MemoryDiskNetworkUSBDVD …
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Processor Level Parallelism 1
Conclusions on CS3014 David Gregg Department of Computer Science
Lynn Choi School of Electrical Engineering
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Chapter 1: Introduction
Enabling Effective Utilization of GPUs for Data Management Systems
R SE to the challenges of ntelligent systems
Architecture & Organization 1
Scalable Processor Design
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
Hyperthreading Technology
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
Some challenges in heterogeneous multi-core systems
Architecture & Organization 1
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
Lecture: Cache Innovations, Virtual Memory
Introduction to Multiprocessors
Co-designed Virtual Machines for Reliable Computer Systems
Chip&Core Architecture
Multicore and GPU Programming
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!

Challenge: scaling Scaling software –Virtual Machine Monitors (easy) –Operating Systems (hard) –Applications (hardest) Scaling hardware –Memory bandwidth –I/O bandwidth Prediction: mainstream will remain < 100 cores for next 5 years. Lack of applications, Amdahl’s law and power efficiency constraints.

Challenge: scheduling Too expensive to context switch –gang scheduling many cores inefficient –disruptive to application Complex resource hierarchy –cache, memory, I/O Opportunity: VMM and OS schedulers will have to understand and schedule complex hierarchies Prediction: partitioning cores rather than time sharing will be the norm

Challenge: isolation Fault isolation and recovery –Large transistor count  cores will fail Performance isolation –Shared resources, e.g., caches, I/O bandwidth Opportunity: –Build fault containment mechanisms into the system architecture –Provide resource reservation controls –System software must handle and recover from faults, enforce performance isolation –Virtualization makes physical machines stateless and interchangeable

Challenge: distance Off-chip resources get farther and farther –Latency-bound applications suffer –I/O becomes even more heavy-weight Opportunity –Bring communication closer to the cores –Rethink I/O architectures Prediction –We will see on-chip I/O controllers and buses

Opportunity: assists Extra cores can be used for –I/O processing –self monitoring Specialized cores –computation (conventional and stream) –communication (TCP processing) –graphics processing (GPU elements) System software to take advantage of these resources

Opportunity: virtualization Killer app for multi-core –Easier to scale job-level parallelism Power efficiency –Scale each application to maximize performance per watt Hide complex hardware topology