Alternative Processors 5/22/20151 John Gustafson CEO, Massively Parallel Technologies (Former CTO, ClearSpeed)

Slides:



Advertisements
Similar presentations
Analysis of Algorithms: time & space Dr. Jeyakesavan Veerasamy The University of Texas at Dallas, USA.
Advertisements

GCSE Computing Lesson 5.
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Computer Abstractions and Technology
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Computer Organization and Architecture
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Memory Management 2010.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Contemporary Languages in Parallel Computing Raymond Hummel.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
What is Concurrent Programming? Maram Bani Younes.
Computer Organization ANGELITO I. CUNANAN JR. 1. What is Computer?  An electronic device used for storing and processing data.  It is a machine that.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Computer Graphics Graphics Hardware
CSC-115 Introduction to Computer Programming
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Memory Management – Page 1 of 49CSCI 4717 – Computer Architecture Memory Management Uni-program – memory split into two parts –One for Operating System.
Large-scale Deep Unsupervised Learning using Graphics Processors
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
GPU Architecture and Programming
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Memory Management. Memory  Commemoration or Remembrance.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Software Types Three layers of software Operation.
CS2100 Computer Organisation Input/Output – Own reading only (AY2015/6) Semester 1 Adapted from David Patternson’s lecture slides:
CS 1308 Computer Literacy and the Internet. Objectives In this chapter, you will learn about:  The components of a computer system  Putting all the.
Performance Tuning John Black CS 425 UNR, Fall 2000.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
UNIX Operating System. A Brief Review of Computer System 1. The Hardware CPU, RAM, ROM, DISK, CD-ROM, Monitor, Graphics Card, Keyboard, Mouse, Printer,
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.
Productive Performance Tools for Heterogeneous Parallel Computing
Bus Systems ISA PCI AGP.
Parallel Processing - introduction
Architecture Background
Microcomputer Architecture
Lecture 2: Intro to the simd lifestyle and GPU internals
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Chapter 2: Operating-System Structures
Java Programming Introduction
Year 10 Computer Science Hardware - CPU and RAM.
Chapter 2: Operating-System Structures
Presentation transcript:

Alternative Processors 5/22/20151 John Gustafson CEO, Massively Parallel Technologies (Former CTO, ClearSpeed)

What is an Alternative Processor 5/22/20152 A processor (and local memory) specialized to a task that otherwise would be handled by a mainstream CPU Two kinds: Invisible (like disk controllers, GPUs used for graphics) Visible (like network accelerators, FPGAs, GPUs made accessible for programming) Superior adaptation to a task can improve Performance per watt Performance per liter of system volume Performance per dollar

Memory Structure 5/22/20153 A big advantage for HPC is the access to a separate memory space that sacrifices generality for speed NVIDIA and ClearSpeed can offer 100s of GBytes/sec (Gbyps) for vector-type operations on small RAM… that’s far more important than any peak Gflops capability. x86 cache is very wasteful, especially for HPC. Only about 20% of transfers actually wind up being used! And x86 cache is highly optimized for non-HPC applications. HPC library programmers know how to exploit an explicitly-managed hierarchical memory.

How Should They Be Used? 5/22/20154 ~90% of users should never have to program them, at all. They should be invisible. ~10% of users can justify altering code to use alternative library calls and overlapping execution. <1% of users should ever muck with the direct native programming of an alternative processor.

What Language Should be Used? 5/22/20155 This is a question only for that 10% of programmers category mentioned on the last slide. (For the <1%… C or assembler!) It’s a makefile issue, not a language issue! We already conceal massive amounts of system management complexity in the makefiles. The last thing we need is a new language. We need better ways to extend existing languages, carefully. (And yes, a ‘wrapper’ standard like Exochi or CUDA might help.) One line of tested, debugged, documented code costs about $50 to $100. Simple budget arguments show why we have code inertia.

Accelerating the Right Thing 5/22/20156 Folks… why are we still trying to accelerate computation? It’s already hundreds of times faster than the communication! <1% of all HPC applications are still compute-bound (not counting the LINPACK runs done for press releases). The rest are bound by memory bandwidth, inter-server communications, and disk I/O. Massively Parallel Technologies is focused on the >99% of HPC applications that are communication-bound. This is the future of alternative processors. And the ‘alternative’ must have as little impact on users and programmers as possible.