1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)

Slides:



Advertisements
Similar presentations
MEMORY popo.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
CS2100 Computer Organisation Performance (AY2014/2015) Semester 2.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
1  1998 Morgan Kaufmann Publishers Chapter 2 Performance Text in blue is by N. Guydosh Updated 1/25/04*
ENGIN112 L30: Random Access Memory November 14, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 30 Random Access Memory (RAM)
Overview Finite State Machines - Sequential circuits with inputs and outputs State Diagrams - An abstraction tool to visualize and analyze sequential circuits.
1 Introduction Rapidly changing field: –vacuum tube -> transistor -> IC -> VLSI (see section 1.4) –doubling every 1.5 years: memory capacity processor.
1 CSE SUNY New Paltz Chapter 2 Performance and Its Measurement.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Overview Logic Combinational Logic Sequential Logic Storage Devices SR Flip-Flops D Flip Flops JK Flip Flops Registers Addressing Computer Memory.
1  1998 Morgan Kaufmann Publishers and UCB Performance CEG3420 Computer Design Lecture 3.
Overview Recall Combinational Logic Sequential Logic Storage Devices
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 ECE3055 Computer Architecture and Operating Systems Lecture 2 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
What’s on the Motherboard? The two main parts of the CPU are the control unit and the arithmetic logic unit. The control unit retrieves instructions from.
Memory. When we receive some instruction or information we retain them in our memory. Similarly a computer stores the instructions for solving a problem,
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
Computer Processing of Data
Introduction to Computing: Lecture 4
Computers Are Your Future Eleventh Edition Chapter 2: Inside the System Unit Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Chapter 3 Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Combinational vs.
1Copyright © Prentice Hall 2000 The Central Processing Unit Chapter 3 What Goes on Inside the Computer.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Computer Architecture CST 250 MEMORY ARCHITECTURE Prepared by:Omar Hirzallah.
Performance.
SKILL AREA: 1.2 MAIN ELEMENTS OF A PERSONAL COMPUTER.
1 CS465 Performance Revisited (Chapter 1) Be able to compare performance of simple system configurations and understand the performance implications of.
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
Chapter 3 Digital Logic Structures. 3-2 Combinational vs. Sequential Combinational Circuit always gives the same output for a given set of inputs  ex:
Computer Hardware The Processing Unit.
1 COMS 361 Computer Organization Title: Performance Date: 10/02/2004 Lecture Number: 3.
1  1998 Morgan Kaufmann Publishers How to measure, report, and summarize performance (suorituskyky, tehokkuus)? What factors determine the performance.
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
1  1998 Morgan Kaufmann Publishers Lectures for 2nd Edition Note: these lectures are often supplemented with other materials and also problems from the.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Introduction to Computing Systems and Programming Digital Logic Structures.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Latches, Flip Flops, and Memory ECE/CS 252, Fall 2010 Prof. Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin – Madison.
Computer Architecture CSE 3322 Web Site crystal.uta.edu/~jpatters/cse3322 Send to Pramod Kumar, with the names and s.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COD Ch. 1 Introduction + The Role of Performance.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
CPEN Digital System Design Assessing and Understanding CPU Performance © Logic and Computer Design Fundamentals, 4 rd Ed., Mano Prentice Hall © Computer.
Chapter 2 content Basic organization of computer What is motherboard
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
RAM, CPUs, & BUSES Egle Cebelyte.
The Central Processing Unit
Prof. Hsien-Hsin Sean Lee
THE CPU i Bytes 1.1.
CS2100 Computer Organisation
Computer Performance He said, to speed things up we need to squeeze the clock.
January 25 Did you get mail from Chun-Fa about assignment grades?
Computer Performance Read Chapter 4
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS2100 Computer Organisation
Presentation transcript:

1 Embedded Systems Computer Architecture

Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)

Embedded Systems3 View of Computer System Application Software Operating System Hardware Driver

Embedded Systems4 Memory To build a memory -- a logical k × m array of stored bits. k = 2 n locations m bits Address Space: number of locations (usually a power of 2) Addressability: number of bits per location (e.g., byte-addressable)

Embedded Systems5 Memory Example: Memory addresses if m=8 8 bits

Embedded Systems6 Memory Example: Memory addresses if m=16 (byte addressable) 16 bits

Embedded Systems7 Memory Example: Memory addresses if m=32 (byte addressable) 32 bits

Embedded Systems8 Memory Example: Memory addresses if m=16 (location addressable) This is the model that LC-3 uses. 16 bits

Embedded Systems9 2 2 x 3 Memory address decoder word select word WE address write enable input bits output bits

Embedded Systems10 More Memory Details This is a not the way actual memory is implemented. –fewer transistors, much more dense, relies on electrical properties But the logical structure is very similar. –address decoder –word select line –word write enable Two basic kinds of memory (RAM = Random Access Memory) Static RAM (SRAM) –fast, maintains data without power refresh Dynamic RAM (DRAM) –slower but denser, bit storage must be periodically refreshed

Embedded Systems11 Even More Memory Details There are other types of “non-volatile” memory devices: ROM PROM EPROM EEPROM Flash Can you think of other memory devices?

Embedded Systems12 Electronics Packaging –There are several packaging technologies available that an engineer can use to create electronic devices. –Some are suitable for inexpensive toys but not miniature consumer products, and some are suitable for miniature consumer products but not inexpensive toys. –These packages have metal leads that are the conductive wire that connect electricity from the outside world to the silicon inside the package. –Leads between packages are connected with small copper traces on a printed circuit board (PCB), and the package leads are soldered to the PCB.

Embedded Systems13 Examples of Electronics Packages Dual In-line Package (DIP) Older technology, requires the metal leads to go through a hole in the printed circuit board. Dual Flat Pack (DFP) - A fairly recent technology, metal leads solder to the surface of the printed circuit board.

Embedded Systems14 Examples of Electronics Packages Quad Flat Pack (QFP) - like the Dual Flat Pack, except here are metal leads are on four sides. Ball Grid Array (BGA) - The connections to the component are on the bottom of the chip, and have balls of solder on these connections.

Embedded Systems15 Driving Force: The Clock The clock is a signal that keeps the control unit moving. –At each clock “tick,” control unit moves to the next machine cycle -- may be next instruction or next phase of current instruction. Clock generator circuit: –Based on crystal oscillator –Generates regular sequence of “0” and “1” logic levels –Clock cycle (or machine cycle) -- rising edge to rising edge “1” “0” time  Machine Cycle

Embedded Systems16 Clock Cycles Instead of reporting execution time in seconds, we often use cycles Clock “ticks” indicate when to start activities (one abstraction): cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) time

Embedded Systems17 Some Definitions CPI = Cycles Per Instruction CT = Cycle Time IC = Instruction Count CC = Clock Cycle Count For example, a Pentium has a 233 MHz clock 2.33 x 10 8 clock cycles per second (MHz = 10 6 ) CT = 1/clock rate = 1/ 2.33 x 10 8 clock cycles/second = 4.3 x seconds/clock cycle = 4.3 ns/clock cycle

Embedded Systems18 More Practice My computer runs at 66MHz. What is the cycle time? A computer has a 2.5 ns cycle time. What is the number of cycles per second?

Embedded Systems19 Run time definitions To improve CPU time (same as run time): –Decrease Instruction Count (IC) Good compiler –Decrease CPI (increase “IPC”, AKA inst. level parallelism) Fancy hardware, good compiler –Decrease CT Crack designers “Newton’s law” of microarchitecture

Embedded Systems20 Examples and Practice Program A takes 3 x10 10 clock cycles to execute. How long does this take to run on a 100 MHz? CPU time= CC x CT = 3 x clock cycles x 10 x clock cycles/second = 3 x 10 2 = 300 seconds How long will this program take to run on a 233 MHz Pentium?

Embedded Systems21 Example Two implementations of the same instruction set, machine A has a clock cycle time of 1 ns, and a CPI of 2.0, machine B has a clock cycle time of 2 ns and a CPI of 1.2 for the same program. Which machine is faster, and by how much? A is faster by a factor of 1.2

Embedded Systems Our favorite program runs in 10 seconds on computer A, which has a 400 Mhz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?" Don't Panic, can easily work this out from basic principles Example

Embedded Systems A given program will require –some number of instructions (machine instructions) –some number of cycles –some number of seconds We have a vocabulary that relates these quantities: –cycle time (seconds per cycle) –clock rate (cycles per second) –CPI (cycles per instruction) a floating point intensive application might have a higher CPI –MIPS (millions of instructions per second) this would be higher for a program using simple instructions Now that we understand cycles

Embedded Systems Performance Performance is determined by execution time Do any of the other variables equal performance? –# of cycles to execute program? –# of instructions in program? –# of cycles per second? –average # of cycles per instruction? –average # of instructions per second? Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

Embedded Systems Execution Time After Improvement = Execution Time Unaffected + (Execution Time Affected / Amount of Improvement ) Example: "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making it 5 times faster? Principle: Make the common case fast Amdahl's Law

Embedded Systems Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark? Example

Embedded Systems Performance is specific to a particular program/s –Total execution time is a consistent summary of performance For a given architecture performance increases come from: –increases in clock rate (without adverse CPI affects) –improvements in processor organization that lower CPI –compiler enhancements that lower CPI and/or instruction count Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance Remember