Basic Computer Architecture

Slides:



Advertisements
Similar presentations
CH14 Instruction Level Parallelism and Superscalar Processors
Advertisements

Computer Science Education
SE 292 (3:0) High Performance Computing L2: Basic Computer Organization R. Govindarajan
COMP375 Computer Architecture and Organization Senior Review.
Chapter 4 The Von Neumann Model
Topics Left Superscalar machines IA64 / EPIC architecture
Instruction Level Parallelism
The LC-3 – Chapter 5 COMP 2620 Dr. James Money COMP 2620.
3-Software Design Basics in Embedded Systems
Chapter 3 General-Purpose Processors: Software
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
Chapter 2 Instruction Sets 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides)
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Instruction sets Computer architecture taxonomy. Assembly language. 1.
Computer Architecture and Data Manipulation Chapter 3.
Computer Systems. Computer System Components Computer Networks.
Processor Technology and Architecture
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Parallelism Processing more than one instruction at a time. Pipelining
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
© 2000 Morgan Kaufman Overheads for Computers as Components Instruction sets zComputer architecture taxonomy. zAssembly language. ARM processor  PDA’s,
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Pipelining and Parallelism Mark Staveley
© 2000 Morgan Kaufman Overheads for Computers as Components Architectures and instruction sets zComputer architecture taxonomy. zAssembly language.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Unit-2 Instruction Sets, CPUs
Dale & Lewis Chapter 5 Computing components
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
The Processor & its components. The CPU The brain. Performs all major calculations. Controls and manages the operations of other components of the computer.
PipeliningPipelining Computer Architecture (Fall 2006)
Chapter Overview General Concepts IA-32 Processor Architecture
Topics to be covered Instruction Execution Characteristics
Advanced Architectures
ECE354 Embedded Systems Introduction C Andras Moritz.
Central Processing Unit Architecture
William Stallings Computer Organization and Architecture 8th Edition
3.3.3 Computer architectures
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Assembly Language for Intel-Based Computers, 5th Edition
Overview Introduction General Register Organization Stack Organization
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
Central Processing Unit CPU
Superscalar Processors & VLIW Processors
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Overheads for Computers as Components 2nd ed.
Introduction to Microprocessor Programming
Computer Architecture
CPU Structure CPU must:
Chapter 4 The Von Neumann Model
Presentation transcript:

Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an

Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I made some modifications to the note for clarity. Assume some background information from CSCE 430 or equivalent

von Neumann architecture Memory holds data and instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc.

von Neumann Architecture Memory Unit Input Unit CPU Control + ALU Output Unit

CPU + memory memory address CPU PC 200 data 200 ADD r5,r1,r3 IR

Recalling Pipelining

Recalling Pipelining What is a potential Problem with von Neumann Architecture?

Harvard architecture address CPU data memory data PC address program memory data

von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs (e.g Blackfin from ADI) use Harvard architecture for streaming data: greater memory bandwidth. different memory bit depths between instruction and data. more predictable bandwidth.

Today’s Processors Harvard or von Neumann?

RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions.

Instruction set characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands.

Tensilica Xtensa RISC based variable length But not CISC

Programming model Programming model: registers visible to the programmer. Some registers are not visible (IR).

Multiple implementations Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes, associativities, configurations; local memory, etc.

Assembly language One-to-one with instructions (more or less). Basic features: One instruction per line. Labels provide names for addresses (usually in first column). Instructions often start in later columns. Columns run to end of line.

ARM assembly language example label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment destination

Pseudo-ops Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants.

Pipelining Execute several instructions simultaneously but at different stages. Simple three-stage pipe: memory fetch decode execute

Pipeline complications May not always be able to predict the next instruction: Conditional branch. Causes bubble in the pipeline: fetch decode Execute JNZ fetch decode execute fetch decode execute

Superscalar RISC pipeline executes one instruction per clock cycle (usually). Superscalar machines execute multiple instructions per clock cycle. Faster execution. More variability in execution times. More expensive CPU.

Simple superscalar Execute floating point and integer instruction at the same time. Use different registers. Floating point operations use their own hardware unit. Must wait for completion when floating point, integer units communicate.

Costs Good news---can find parallelism at run time. Bad news---causes variations in execution time. Requires a lot of hardware. n2 instruction unit hardware for n-instruction parallelism.

Finding parallelism Independent operations can be performed in parallel: ADD r0, r0, r1 ADD r3, r2, r3 ADD r6, r4, r0 r1 r2 r3 r0 + + r3 r4 r0 + r6

Pipeline hazards Two operations that have data dependency cannot be executed in parallel: x = a + b; a = d + e; y = a - f; a + x f b - y d + a e

Order of execution In-order: Out-of-order: Machine stops issuing instructions when the next instruction can’t be dispatched. Out-of-order: Machine will change order of instructions to keep dispatching. Substantially faster but also more complex.

VLIW architectures Very long instruction word (VLIW) processing provides significant parallelism. Rely on compilers to identify parallelism.

instruction decode and memory What is VLIW? Parallel function units with shared register file: register file function unit function unit function unit function unit ... instruction decode and memory

VLIW cluster Organized into clusters to accommodate available register bandwidth: cluster cluster cluster ...

VLIW and compilers VLIW requires considerably more sophisticated compiler technology than traditional architectures---must be able to extract parallelism to keep the instructions full. Many VLIWs have good compiler support.

Scheduling a b e f c g d a b e f c nop d g nop expressions instructions

EPIC EPIC = Explicitly parallel instruction computing. Used in Intel/HP Merced (IA-64) machine. Incorporates several features to allow machine to find, exploit increased parallelism.

IA-64 instruction format Instructions are bundled with tag to indicate which instructions can be executed in parallel: 128 bits tag instruction 1 instruction 2 instruction 3

Memory system CPU fetches data, instructions from a memory hierarchy: Main memory CPU L2 cache L1 cache

Memory hierarchy complications Program behavior is much more state-dependent. Depends on how earlier execution left the cache. Execution time is less predictable. Memory access times can vary by 100X.

Memory Hierarchy Complication   Pentium 3-M Pentium 4-M Pentium M Core P6 (Tualatin 0.13µ) Netburst (Northwood 0.13µ) "P6+" (Banias 0.13µ, Dothan 0.09µ) L1 Cache (data + code) 16Kb + 16Kb 8Kb + 12Kµops (TC) 32Kb + 32Kb L2 Cache 512Kb 1024Kb Instructions Sets MMX, SSE MMX, SSE, SSE2 Max frequencies (CPU/FSB) 1.2GHz 133MHz 2.4GHz 400MHz (QDR) 2GHz 400MHz (QDR) Number of transistors 44M 55M 77M, 140M SpeedStep 2nd generation 3rd generation

End of Overview Next class: Altera Nios II processors