The CRAY-1 Computer System Richard Russell Communications of the ACM January 1978.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Architecture
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
PIPELINING AND VECTOR PROCESSING
PIPELINE AND VECTOR PROCESSING
Parul Polytechnic Institute
Vector Processors Part 2 Performance. Vector Execution Time Enhancing Performance Compiler Vectorization Performance of Vector Processors Fallacies and.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Computer Architecture A.
The University of Adelaide, School of Computer Science
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
Processor System Architecture
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Computer Organization and Architecture
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Computer System Overview
Computer System Overview
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Topic 1: Introduction to Computers and Programming
Operating Systems Lecture 1 Crucial hardware concepts review M. Naghibzadeh Reference: M. Naghibzadeh, Operating System Concepts and Techniques, iUniverse.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
Computer Systems Overview. Page 2 W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware resources of one.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
MICROPROCESSOR INPUT/OUTPUT
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 1 Computer System Overview.
Execution of an instruction
Operating Systems Lecture November 2015© Copyright Virtual University of Pakistan 2 Agenda for Today Review of previous lecture Hardware (I/O, memory,
The CRAY-1 Computer System Richard M. Russell Presented by Andrew Waterman ECE259 Spring 2008.
13-Nov-15 (1) CSC Computer Organization Lecture 7: Input/Output Organization.
© 2004, D. J. Foreman 1 Computer Organization. © 2004, D. J. Foreman 2 Basic Architecture Review  Von Neumann ■ Distinct single-ALU & single-Control.
Overview of Super-Harvard Architecture (SHARC) Daniel GlickDaniel Glick – May 15, 2002 for V (Dewar)
Chapter 4 MARIE: An Introduction to a Simple Computer.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
EFLAG Register of The The only new flag bit is the AC alignment check, used to indicate that the microprocessor has accessed a word at an odd.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Lecture 1: Review of Computer Organization
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
بسم الله الرحمن الرحيم MEMORY AND I/O.
1 Computer Architecture. 2 Basic Elements Processor Main Memory –volatile –referred to as real memory or primary memory I/O modules –secondary memory.
Computer Architecture Chapter (5): Internal Memory
DEPARTMENT OF ELECTRONICS ENGINEERING V-SEMESTER MICROPROCESSOR & MICROCONTROLLER 1 CHAPTER NO microcontroller & programming.
Computer Systems Overview. Lecture 1/Page 2AE4B33OSS W. Stallings: Operating Systems: Internals and Design, ©2001 Operating System Exploits the hardware.
Chapter Overview General Concepts IA-32 Processor Architecture
Vector Processing => Multimedia
COMP4211 : Advance Computer Architecture
William Stallings Computer Organization and Architecture 8th Edition
Digital Logic Structures Logic gates & Boolean logic
William Stallings Computer Organization and Architecture 7th Edition
William Stallings Computer Organization and Architecture 8th Edition
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Module 2: Computer-System Structures
Multivector and SIMD Computers
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Module 2: Computer-System Structures
William Stallings Computer Organization and Architecture 8th Edition
Computer System Overview
Module 2: Computer-System Structures
Module 2: Computer-System Structures
Computer Architecture Assembly Language
Presentation transcript:

The CRAY-1 Computer System Richard Russell Communications of the ACM January 1978

“The world’s most expensive love- seat”

A “reasonably trim individual” can gain access to the interior of the machine ns clock 12.5 ns clock 8 MB internal semiconductor memory 8 MB internal semiconductor memory 4 KB of register storage 4 KB of register storage Uses ECL throughout Uses ECL throughout 115 kW input power 115 kW input power Simple gates Simple gates

Memory 16 bank = 16 way interleaved access 16 bank = 16 way interleaved access No bank conflicts except on stride lengths of 8 or 16 No bank conflicts except on stride lengths of 8 or 16 4 clock cycles per access 4 clock cycles per access Can pull down 16 instructions per cycle Can pull down 16 instructions per cycle 1 data word if being placed in registers 1 data word if being placed in registers

Cooling Big power + many modules = heat Big power + many modules = heat Aluminum/steel cooling rods with Freon flow Aluminum/steel cooling rods with Freon flow Copper connectors pipe heat from chip out to cooling rods Copper connectors pipe heat from chip out to cooling rods Freon/oil leak problem on rod construction Freon/oil leak problem on rod construction Designed to keep module temperatures under 54 degrees Celsius Designed to keep module temperatures under 54 degrees Celsius

Floating Point IEEE? IEEE? No. No. Why? Why? Not written yet! Not written yet! Wouldn’t arrive until 7 years later. Wouldn’t arrive until 7 years later. 49 bit signed magnitude “mantissa” 49 bit signed magnitude “mantissa” 15 bit biased exponent 15 bit biased exponent

Production plans anticipate shipping one CRAY-1 per quarter.

Topic: Vector Computers 8 64X64 vector registers 8 64X64 vector registers Process vector elements identically Process vector elements identically Vector Mask register can protect an element Vector Mask register can protect an element “Chaining” “Chaining” Can use output of one vector operation as input to next before it is done Can use output of one vector operation as input to next before it is done Win = don’t have to store to memory then fetch from memory Win = don’t have to store to memory then fetch from memory

Benefits of Vector Computing Previously needed 100+ elements for vector to be useful over scalar Previously needed 100+ elements for vector to be useful over scalar CRAY-1 cuts that to 2-4 CRAY-1 cuts that to 2-4 Don’t need to store vector elements next to each other in memory Don’t need to store vector elements next to each other in memory Max wait time is previous vector length + 4 Max wait time is previous vector length + 4 Common wait time is functional unit time + 2 Common wait time is functional unit time + 2

Vector Benefits Continued

Compiler CFT CFT Automatically vectorizes inner loop if possible Automatically vectorizes inner loop if possible No need to rewrite code! No need to rewrite code! Can’t vectorize loops with control statements. Can’t vectorize loops with control statements. Often slower than hand coded assembly. Often slower than hand coded assembly. Improve instruction scheduling “in the future” Improve instruction scheduling “in the future”

Questions The CRAY-1 automatically vectorizes code loops. Current microprocessors usually use smaller vector registers with extensions such as SSE to support SIMD operations. Do modern compilers do these vector optimizations automatically as the CRAY did or is it the explicit use of vector instructions that has dominated and why? Trade offs? The CRAY-1 automatically vectorizes code loops. Current microprocessors usually use smaller vector registers with extensions such as SSE to support SIMD operations. Do modern compilers do these vector optimizations automatically as the CRAY did or is it the explicit use of vector instructions that has dominated and why? Trade offs? They say they can eventually make loops with control flow in them vectorizable. Can you come up with a simple method to do so and/or some reasons that make this case difficult? They say they can eventually make loops with control flow in them vectorizable. Can you come up with a simple method to do so and/or some reasons that make this case difficult?

Table 3

Registers A = 8 address registers A = 8 address registers B = 64 address-save registers B = 64 address-save registers S = 8 scalar registers S = 8 scalar registers T = 64 scalar-save registers T = 64 scalar-save registers V = 8 64X64 vector registers V = 8 64X64 vector registers

Special Registers VM = mask off vector elements to not operate on VM = mask off vector elements to not operate on VL = length of vector being processed VL = length of vector being processed P = parcel address count P = parcel address count BA = absolute address used as base for indexed memory accesses (helps with dynamic user space migration) BA = absolute address used as base for indexed memory accesses (helps with dynamic user space migration) LA = limits the accessible address space LA = limits the accessible address space XA = supports exchange operation XA = supports exchange operation F = flag register that holds various “condition codes” F = flag register that holds various “condition codes” M = mode register (3 bits) M = mode register (3 bits) Bit 1 = Floating Point Error/Interrupt Enable Bit 1 = Floating Point Error/Interrupt Enable Bit 2 = Uncorrectable memory corruption Interrupt Enable Bit 2 = Uncorrectable memory corruption Interrupt Enable Bit 3 = All interrupts disabled. Bit 3 = All interrupts disabled.

Front End Needs an access terminal minicomputer Needs an access terminal minicomputer Connects to a “CRAY access channel” to control the computer Connects to a “CRAY access channel” to control the computer