Advanced Computer Architectures

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
PIPELINE AND VECTOR PROCESSING
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Chapter 8: Central Processing Unit
Computer Organization and Architecture
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Chapter 13 Reduced Instruction Set Computers (RISC) CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer HW: 13.6, 13.7 (Due.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Chapter 13 Reduced Instruction Set Computers (RISC)
Chapter 17 Parallel Processing.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation Microporgrammed control unit.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Chapter 13 Reduced Instruction Set Computers (RISC) CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer.
Chapter 18 Parallel Processing (Multiprocessing).
Reduced Instruction Set Computers (RISC)
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Processor Organization and Architecture
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
M. Mateen Yaqoob The University of Lahore Spring 2014.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
Pipelining and Parallelism Mark Staveley
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
Computer Organization and Architecture
William Stallings Computer Organization and Architecture 8th Edition
Parallel Processing - introduction
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Overview Introduction General Register Organization Stack Organization
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
Chapter 17 Parallel Processing
Instruction Level Parallelism and Superscalar Processors
William Stallings Computer Organization and Architecture 8th Edition
Chapter 13 Reduced Instruction Set Computers
Instruction-level Parallelism: Reduced Instruction Set Computers and
Chapter 12 Pipelining and RISC
Created by Vivi Sahfitri
Advanced Topic: Alternative Architectures Chapter 9 Objectives
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Advanced Computer Architectures Unit-6

Contents: Reduced Instruction Set Computers Complex Instruction Set Computers Super Scalars Vector Processing Parallel Cluster Computers Distributed Computers.

1. RISC Reduced Instruction Set Computer Key features Large number of general purpose registers or use of compiler technology to optimize register use Limited and simple instruction set Emphasis on optimising the instruction pipeline Driving force for CISC Software costs far exceed hardware costs Increasingly complex high level languages Leads to: Large instruction sets More addressing modes Hardware implementations of HLL statements

Execution Characteristics Operations performed Operands used Execution sequencing Operations Assignments Movement of data Conditional statements (IF, LOOP) Sequence control Procedure call-return is very time consuming Some HLL instruction lead to many machine code operations

Procedure Calls Implications Very time consuming Depends on number of parameters passed Depends on level of nesting Most programs do not do a lot of calls followed by lots of returns Most variables are local (c.f. locality of reference) Implications Best support is given by optimising most used and most time consuming features Large number of registers Operand referencing Careful design of pipelines Branch prediction etc. Simplified (reduced) instruction set

Large Register File Software solution Hardware solution Require compiler to allocate registers Allocate based on most used variables in a given time Requires sophisticated program analysis Hardware solution Have more registers Thus more variables will be in registers Registers for Local Variables Store local scalar variables in registers Reduces memory access Every procedure (function) call changes locality Parameters must be passed Results must be returned Variables from calling programs must be restored

Register Windows Register Windows cont. Only few parameters Limited range of depth of call Use multiple small sets of registers Calls switch to a different set of registers Returns switch back to a previously used set of registers Register Windows cont. Three areas within a register set Parameter registers Local registers Temporary registers Temporary registers from one set overlap parameter registers from the next This allows parameter passing without moving data

Overlapping Register Windows Circular Buffer diagram

Operation of Circular Buffer When a call is made, a current window pointer is moved to show the currently active register window If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory A saved window pointer indicates where the next saved windows should restore to Global Variables Allocated by the compiler to memory Inefficient for frequently accessed variables Have a set of registers for global variables

Registers v Cache Large Register File Cache All local scalars Recently used local scalars Individual variables Blocks of memory Compiler assigned global variables Recently used global variables Save/restore based on procedure Save/restore based on nesting caching algorithm Register addressing Memory addressing

Referencing a Scalar - Cache Window Based Register File

RISC Characteristics One instruction per cycle Register to register operations Few, simple addressing modes Few, simple instruction formats Hardwired design (no microcode) Fixed instruction format More compile time/effort

RISC Pipelining Most instructions are register to register Two phases of execution I: Instruction fetch E: Execute ALU operation with register input and output For load and store Calculate memory address D: Memory Register to memory or memory to register operation

Effects of Pipelining

Optimization of Pipelining Delayed branch Does not take effect until after execution of following instruction This following instruction is the delay slot

CISC and RISC (contd.,) CISC: Complex instruction set computers. Complex instructions involve a large number of steps. If individual instructions perform more complex operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. Complex instructions combined with pipelining would achieve good performance.

Complex Instruction Set Computer Another characteristic of CISC computers is that they have instructions that act directly on memory addresses For example, ADD L1, L2, L3 that takes the contents of M[L1] adds it to the contents of M[L2] and stores the result in location M[L3] An instruction like this takes three memory access cycles to execute That makes for a potentially very long instruction execution cycle The problems with CISC computers are The complexity of the design may slow down the processor. The complexity of the design may result in costly errors in the processor design and implementation. Many of the instructions and addressing modes are used rarely.

Overlapped Register Windows Local to D R64 R64 R63 R63 Common to C and D R58 R58 Proc D R57 R57 Local to C R48 R48 R47 R47 Common to B and C R42 R42 R41 Proc C R41 Local to B R32 R32 R31 R31 Common to A and B R26 R26 R25 Proc B R25 Local to A R16 R16 R15 R15 Common to A and D R10 R10 R9 R9 Proc A Common to all procedures R0 R0 Global registers

Characteristics Of RISC RISC Characteristics - Relatively few instructions - Relatively few addressing modes - Memory access limited to load and store instructions - All operations done within the registers of the CPU - Fixed-length, easily decoded instruction format - Single-cycle instruction format - Hardwired rather than microprogrammed control Advantages of RISC - VLSI Realization - Computing Speed - Design Costs and Reliability - High Level Language Support

3. Super Scalar operation A higher degree of concurrency can be achieved if multiple instruction pipelines are implemented in the processor. This means that multiple function units are used, creating parallel paths through which different instructions can be executed in parallel. With such an arrangement, it becomes possible to start the execution of several instructions in every clock cycle. This mode of execution is called Super scalar operation.

General Superscalar Organization Superpipelined Many pipeline stages need less than half a clock cycle Double internal clock speed gets two tasks per external clock cycle Superscalar allows parallel fetch execute

Superscalar Vs Superpipeline

Limitations Instruction level parallelism Compiler based optimisation Hardware techniques Limited by True data dependency (ADD r1, r2 (r1 := r1+r2;) MOVE r3,r1 (r3 := r1;) Can fetch and decode second instruction in parallel with first Can NOT execute second instruction until first is finished) Procedural dependency (Can not execute instructions after a branch in parallel with instructions before a branch Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed This prevents simultaneous fetches) Resource conflicts (Two or more instructions requiring access to the same resource at the same time e.g. two arithmetic instructions Can duplicate resources e.g. have two arithmetic units Output dependency Antidependency

Effect of Dependencies

4. Vector Computation Maths problems involving physical processes present different difficulties for computation Aerodynamics, seismology, meteorology Continuous field simulation High precision Repeated floating point calculations on large arrays of numbers Supercomputers handle these types of problem Hundreds of millions of flops $10-15 million Optimised for calculation rather than multitasking and I/O Limited market Research, government agencies, meteorology Array processor Alternative to supercomputer Configured as peripherals to mainframe & mini Just run vector portion of problems

Approaches to Vector Computation

5. Parallel Cluster Multiple Processor Organization Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD Multiple instruction, single data stream - MISD Multiple instruction, multiple data stream- MIMD

Taxonomy of Parallel Processor Architectures

Loosely Coupled - Clusters Collection of independent uniprocessors or SMPs Interconnected to form a cluster Communication via fixed path or network connections Parallel Organizations - SISD

Parallel Organizations - SIMD Parallel Organizations - MIMD Shared Memory Parallel Organizations - SIMD

Parallel Organizations - MIMD Distributed Memory Block Diagram of Tightly Coupled Multiprocessor

Organization Classification Time shared or common bus Multiport memory Central control unit Symmetric Multiprocessor Organization

Clusters Alternative to SMP High performance High availability Server applications A group of interconnected whole computers working together as unified resource Illusion of being one machine Each computer called a node Cluster Benefits Absolute scalability Incremental scalability Superior price/performance

Cluster Configurations - Standby Server, No Shared Disk Cluster Configurations - Shared Disk

Cluster Computer Architecture

6. Distributed Computers

The End