Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.

Slides:



Advertisements
Similar presentations
Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Multicore Architectures Michael Gerndt. Development of Microprocessors Transistor capacity doubles every 18 months © Intel.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Computer Abstractions and Technology
Introduction Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063: Programming heterogeneous multi-core processors.
Ido Tov & Matan Raveh Parallel Processing ( ) January 2014 Electrical and Computer Engineering DPT. Ben-Gurion University.
Prepared and Presented by: Class Presentation of Custom DSP Implementation Course This is a class presentation. All data are copyrights of their respective.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
ELEC 6200, Fall 07, Oct 29 McPherson: Vector Processors1 Vector Processors Ryan McPherson ELEC 6200 Fall 2007.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
Prof. Milo Martin for CIS700
Advanced Computer Architectures
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
1 Layers of Computer Science, ISA and uArch Alexander Titov 20 September 2014.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
Evaluation of Multi-core Architectures for Image Processing Algorithms Masters Thesis Presentation by Trupti Patil July 22, 2009.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix E Authors: John Hennessy & David Patterson.
Cell Broadband Engine Architecture Bardia Mahjour ENCM 515 March 2007 Bardia Mahjour ENCM 515 March 2007.
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.
Agenda Performance highlights of Cell Target applications
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
10-1 Chapter 10 - Advanced Computer Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Introduction to CMOS VLSI Design Lecture 22: Case Study: Intel Processors David Harris Harvey Mudd College Spring 2004.
1/21 Cell Processor (Cell Broadband Engine Architecture) Mark Budensiek.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 The IBM Cell Processor – Architecture and On-Chip Communication Interconnect.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
Playstation2 Architecture Architecture Hardware Design.
Optimizing Ray Tracing on the Cell Microprocessor David Oguns.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
High performance computing architecture examples Unit 2.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
1/21 Cell Processor Systems Seminar Diana Palsetia (11/21/2006)
Processor Level Parallelism 1
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Topics to be covered Instruction Execution Characteristics
Lynn Choi School of Electrical Engineering
High Performance Computing on an IBM Cell Processor --- Bioinformatics
Cell Architecture.
FPGAs in AWS and First Use Cases, Kees Vissers
Array Processor.
Superscalar Processors & VLIW Processors
Chapter 1 Introduction.
What is Computer Architecture?
What is Computer Architecture?
What is Computer Architecture?
Course Outline for Computer Architecture
Multicore and GPU Programming
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Multicore and GPU Programming
Presentation transcript:

Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Motivation  IBM’s formal name for Cell is “Cell Broadband Engine Architecture” (CBEA)  Sony wanted:  Quantum leap in performance over PlayStation 2’s “Emotion Engine” chip (made by Toshiba)  Toshiba wanted:  Remain a part of volume manufacturing for Sony PlayStation  IBM wanted:  A piece of the PlayStation 3 pie  A second try at network processor architecture  Something reusable, applicable far beyond PlayStation

Sam Sandbote CSE 8383 Advanced Computer Architecture Goals  Application domains  Graphics Rendering ($$)  DSP & Multimedia Processing ($$)  Cryptography  Physics simulations  Matrix math and other scientific processing  Heavy use of SIMD – why?  Cray and similar machines of 1970s achieved performance through vectorization rather than MIMD parallelization  The above applications are areas in which SIMD is still the best architecture

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Software Cells: The Concept  Definition  Bundle of application code and working data  Features  Necessarily object-oriented  Cells can migrate to any processor – local or remote  Distributed processing is native, and actually assumed Execution of cell code actually looks like a remote procedure call  A cell contains everything it needs to execute autonomously without references to other memory, programs or resources  Highly secure model!

Sam Sandbote CSE 8383 Advanced Computer Architecture Software Cells: Formatting Source: U.S. Patent #6,809,734

Sam Sandbote CSE 8383 Advanced Computer Architecture Comparison with Dataflow Architecture  Granularity  Dataflow execution granularity is one instruction  Cell execution granularity is a procedure, or several hundred instructions opcode operand A address operand B address destination address Dataflow instruction template:

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Machine Architecture  Each Cell SoC contains:  Conventional processor (PPE), for control and a lightweight OS 2-way SMT, 2-way superscalar in-order Power core  Multiple Synergistic Processing Elements (SPEs) These are execution engines for RPC of a software-cell  DMA interface to memory and I/O  Element Interconnect Bus (EIB), actually a ring bus  Each SPE contains:  128 registers, 128 bits wide in unified regfile (2Kbytes of registers!)  256 Kbytes local memory  4 SIMD integer pipelines/ALUs  4 SIMD floating point pipelines/FPUs

Sam Sandbote CSE 8383 Advanced Computer Architecture SoC Architecture ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory ALUs (4) FPUs (4) regfile 128x KB local memory 64-bit SMT Power core, 2x in-order superscalar 512K L2 I$D$ EIB DMA, I/O Controllers PPE

Sam Sandbote CSE 8383 Advanced Computer Architecture (Envisioned) SPU Architecture  Resources for execution of multiple software cells are reserved in advance by the PPE:  Some portion of local memory  One or more dedicated integer/FP pipelines  Not SMT – pipelines are allocated in a dedicated way for the duration of the execution of the cell  Execution is supposed to be entirely self-contained  Software cell is small enough to execute on only one APU  No use of DRAM – the only addressable memory is local Local memory is not cache – no coherence  No interaction with any other executing cell until finished

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Prototype Chip Floorplan Source: IBM

Sam Sandbote CSE 8383 Advanced Computer Architecture Notes on Prototype  Chip Statistics  Peak single precision > 256 Gflops  Peak double precision > 26 Gflops  4.6GHz frequency demonstrated in working silicon This was historic, following Intel 6GHz Tejas project cancellation 11 gates per cycle – more than is typical  Rambus XDR DRAM interface, 25.6GB/s  234M transistors, 221mm 2 in 90nm SOI process  Power is 1.2V typical (estimated)  2,965 chip pins  SPE Disappointments  Does not support execution of multiple cells at once  Probably a lot of wasted execution units

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Programmer’s Interface: Two-Parts 1.Control and Management on PPE  Ordinary Power ISA and programmer’s view  Runs a lightweight Linux OS – main tasks are: Coordinate execution of software cells Route data inputs and outputs Handle run-time exceptions 2.Software Cell Execution on SPE  New ISA and new (extremely simple) programmer’s view  Requires special code development tools Possibly, a special programming language Special compiler Debugging of distributed processing is messy

Sam Sandbote CSE 8383 Advanced Computer Architecture Topics 1.Overview 2.Software Cells 3.Machine Architecture 4.Product Prototype 5.Programmer’s Interface 6.References and Glossary

Sam Sandbote CSE 8383 Advanced Computer Architecture Cell References Flachs et al. “The Microarchitecture of the Streaming Processor for a CELL Processor.” Proc ISSCC. Gaudiot and Bic (editors). Advanced Topics in Data-Flow Computing. Prentice Hall, Gschwind et al. “A novel SIMD architecture for the Cell heterogeneous chip-multiprocessor.” HotChips 17, August Halfhill, Tom. “New Patent Reveals Cell Secrets.” Microprocessor Report, 1/3/ Krewell, Kevin. “Cell Moves Into the Limelight.” Microprocessor Report, 2/14/ Pham et al. “The Design and Implementation of a First-Generation CELL Processor.” Proc ISSCC. et al. “Resource Dedication System and Method for a Computer Architecture for Broadband Networks.”Suzuoki et al. “Resource Dedication System and Method for a Computer Architecture for Broadband Networks.” U.S. Patent No. 6,809,734.