Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

MEMORY popo.
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Chapter 6 Computer Architecture
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Computer Systems. Computer System Components Computer Networks.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Data Manipulation Computer System consists of the following parts:
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
1 CS402 PPP # 1 Computer Architecture Evolution. 2 John Von Neuman original concept.
GCSE Computing - The CPU
Operating Systems Lecture 1 Crucial hardware concepts review M. Naghibzadeh Reference: M. Naghibzadeh, Operating System Concepts and Techniques, iUniverse.
Cell Broadband Processor Daniel Bagley Meng Tan. Agenda  General Intro  History of development  Technical overview of architecture  Detailed technical.
Pipelining By Toan Nguyen.
1 Chapter 4 The Central Processing Unit and Memory.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
Computer Architecture CST 250 INTEL PENTIUM PROCESSOR Prepared by:Omar Hirzallah.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Writer:-Rashedul Hasan Editor:- Jasim Uddin
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Computing hardware CPU.
Invitation to Computer Science 5th Edition
 Design model for a computer  Named after John von Neuman  Instructions that tell the computer what to do are stored in memory  Stored program Memory.
MICROPROCESSOR INPUT/OUTPUT
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
What have mr aldred’s dirty clothes got to do with the cpu
The Central Processing Unit
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
Computer Organization & Assembly Language © by DR. M. Amer.
FIRST COURSE Essential Computer Concepts. XP New Perspectives on Microsoft Office 2007: Windows XP Edition2 What Is a Computer? A computer is an electronic.
Introduction to Microprocessors
Sam Sandbote CSE 8383 Advanced Computer Architecture The IBM Cell Architecture Sam Sandbote CSE 8383 Advanced Computer Architecture April 18, 2006.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture on Central Process Unit (CPU)
Chapter 2 Turning Data into Something You Can Use
The Central Processing Unit (CPU)
Processor Rashedul Hasan. Processor The microprocessor is sometimes referred to as the 'brain' of the personal computer, and is responsible for the processing.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
M211 – Central Processing Unit
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
1/21 Cell Processor Systems Seminar Diana Palsetia (11/21/2006)
PipeliningPipelining Computer Architecture (Fall 2006)
What’s going on here? Can you think of a generic way to describe both of these?
Computer Hardware What is a CPU.
GCSE Computing - The CPU
Cell Architecture.
THE CPU i Bytes 1.1.
Microprocessor & Assembly Language
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
GCSE Computing - The CPU
Presentation transcript:

Cell Architecture

Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture as it exists today was the work of three companies: Sony, Toshiba and IBM

Why Cell? Sony and Toshiba (being major electronics manufacturers) buy in all manner of different components. One of the reasons for Cell's development is they want to save costs by building their own components. Next generation consumer technologies such as Blu-ray, HDTV, HD Camcorders and of course the PS3 will all require a very high level of computing power and they are going to need the chips to provide it. Cell will be used for all of these and more, IBM will also be using the chips in servers. The partners can also sell the chips to 3rd party manufacturers

What is Cell? Cell is an architecture for high performance distributed computing. It is comprised of hardware and software Cells, software Cells consist of data and programs (known as jobs or apulets), these are sent out to the hardware Cells where they are computed, the results are then returned According to IBM the Cell performs 10x faster than existing CPUs on many applications.

Specifications A hardware Cell is made up of: 1 Power Processor Element (PPE). 8 Synergistic Processor Elements (SPEs). Element Interconnect Bus (EIB). Direct Memory Access Controller (DMAC). 2 Rambus XDR memory controllers. Rambus FlexIO (Input / Output) interface.

Power Processor Element (PPE) The PPE is a conventional microprocessor core which sets up tasks for the SPEs to do. In a Cell based system the PPE will run the operating system and most of the applications but compute intensive parts of the OS and applications will be offloaded to the SPEs. The PPE is a 64 bit, "Power Architecture" processor with 512K cache. While the PPE uses the PowerPC instruction set, it is not based on an existing design on the market today. That is to say, it is NOT based on the existing 970 / G5 or POWER processors. It is a completely different architecture so clock speed comparisons are completely meaningless.

Power Processor Element (PPE) The PPE is a dual issue, dual threaded, in-order processor. Unlike many modern processors the hardware architecture is an “old style” RISC design, i.e. the PPE has a relatively simple architecture. Most modern microprocessors devote a large amount of silicon to executing as many instructions as possible at once by executing them "out-of-order" (OOO). This type of design is widely used but it requiring hefty amounts of additional circuitry and consumes large amounts of power. With the PPE, IBM have not done this and have instead gone with a much simpler design which uses considerably less power than other PowerPC devices - even at higher clock rates.

Power Processor Element (PPE) This design will however have the downside of potentially having rather erratic performance on branch laden applications. Such a simple CPU needs the compiler to do a lot of the scheduling work that hardware usually does so a good compiler will be essential. The Cell's high bandwidth memory and I/O subsystems and the PPE's high clock speed and dual threading capability may well make up for these potential performance deficiencies.

Synergistic Processor Elements (SPEs) An SPE is a self contained vector processor which acts as an independent processor. They each contain 128 x 128 bit registers, there are also 4 (single precision) floating point units capable of 32 GigaFLOPS* and 4 Integer units capable of 32 GOPS (Billions of integer Operations per Second) at 4GHz. The SPEs also include a small 256 Kilobyte local store instead of a cache. According to IBM a single SPE (which is just 15 square millimetres and consumes less than 5 Watts at 4GHz) can perform as well as a top end (single core) desktop CPU given the right task.

SPE Local Stores Like the PPE, the SPEs are in-order processors and have no Out-Of-Order capabilities. This means that as with the PPE the compiler is very important. The SPEs do however have 128 registers and this gives plenty of room for the compiler to unroll loops and use other techniques which largely negate the need for OOO hardware. One way in which SPEs operate differently from conventional CPUs is that they lack a cache and instead use a “Local Store”. This potentially makes them (slightly) harder to program but they have been designed this way to reduce hardware complexity and increase performance.

Conventional Cache (~recap) Conventional CPUs perform all their operations in registers which are directly read from or written to main memory. Caches (fast on chip memory) are used to hide the effects of going to or from main memory If data being worked on is not present in the cache, the CPU stalls and has to wait for this data to be fetched.

Dual Core Cache Dual-core CPUs will become common soon and desktop versions have a cache per core. If either of the cores or other system components try to access the same memory address, the data in the cache may become out of date and thus needs updated (made coherent). Supporting this requires logic and takes time and in doing so this limits the speed that a conventional system can access memory and cache. The more processors there are in a system the more complex this problem becomes. Cache design in conventional CPUs speeds up memory access but compromises are required to make it work.

SPE Local Stores To solve the complexity associated with cache design and to increase performance the Cell designers took the radical approach of not including any. Instead they used a series of 256 Kbyte “local stores”, there are 8 of these, 1 per SPE. Local stores are like cache in that they are an on-chip memory but the way they are constructed and act is completely different. The SPEs operate on registers which are read from or written to the local stores. The local stores can access main memory in blocks of 1Kb minimum (16Kb maximum) but the SPEs cannot act directly on main memory (they can only move data to or from the local stores). Caches can deliver similar or even faster data rates but only in very short bursts (a couple of hundred cycles at best), the local stores can each deliver data at this rate continually for over ten thousand cycles without going to RAM.

SPE Local Stores One potential problem is that of “contention”. Data needs to be written to and from memory while data is also being transferred to or from the SPE’s registers and this leads to contention where both systems will fight over access slowing each other down. To get around this the external data transfers access the local memory 1024 bits at a time, in one cycle (equivalent to a transfer rate of 0.5 Terabytes per second!). This is just moving data to and from buffers but moving so much in one go means that contention is kept to a minimum.

Cell Characteristics A big difference in Cells from normal CPUs is the ability of the SPEs in a Cell to be chained together to act as a stream processor [Stream]. A stream processor takes data and processes it in a series of steps.

Cell Architecture

Element Interconnect Bus (EIB) The EIB consists of 4 x 16 byte rings which run at half the CPU clock speed and can allow up to 3 simultaneous transfers. The theoretical peak of the EIB is 96 bytes per cycle (384 Gigabytes per second) however, according to IBM only about two thirds of this is likely to be achieved in practice.

Memory and I/O The Cell will use high speed XDR RAM for memory. A Cell has a memory bandwidth of 25.6 Gigabytes per second which is considerably higher than any PC but necessary as the SPEs will eat as much memory bandwidth as they can get. The Cells run its memory interface at 3.2 Gigabits per second per pin though memory in production now is already capable of higher speeds than this. XDR is designed to scale to 6.4 Gigabits per second so memory bandwidth has the potential to double.