Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 1: Introduction

Similar presentations


Presentation on theme: "Chapter 1: Introduction"— Presentation transcript:

1 Chapter 1: Introduction

2 Outline Embedded systems overview
What are they? Design challenge – optimizing design metrics Technologies Processor technologies IC technologies Design technologies

3 Embedded systems overview
Computing systems are everywhere Most of us think of “desktop” computers PC’s Laptops Mainframes Servers But there’s another type of computing system Far more common...

4 Embedded systems overview
Embedded computing systems Computing systems embedded within electronic devices Hard to define. Nearly any computing system other than a desktop computer Billions of units produced yearly, versus millions of desktop units Perhaps 50 per household and per automobile Computers are in here... and here... and even here... Lots more of these, though they cost a lot less each.

5 A “short list” of embedded systems
Anti-lock brakes Auto-focus cameras Automatic teller machines Automatic toll systems Automatic transmission Avionic systems Battery chargers Camcorders Cell phones Cell-phone base stations Cordless phones Cruise control Curbside check-in systems Digital cameras Disk drives Electronic card readers Electronic instruments Electronic toys/games Factory control Fax machines Fingerprint identifiers Home security systems Life-support systems Medical testing systems Modems MPEG decoders Network cards Network switches/routers On-board navigation Pagers Photocopiers Point-of-sale systems Portable video games Printers Satellite phones Scanners Smart ovens/dishwashers Speech recognizers Stereo systems Teleconferencing systems Televisions Temperature controllers Theft tracking systems TV set-top boxes VCR’s, DVD players Video game consoles Video phones Washers and dryers And the list goes on and on

6 Some common characteristics of embedded systems
Single-functioned Executes a single program, repeatedly Tightly-constrained Low cost, low power, small, fast, etc. Reactive and real-time Continually reacts to changes in the system’s environment Must compute certain results in real-time without delay

7 An embedded system example -- a digital camera
Microcontroller CCD preprocessor Pixel coprocessor A2D D2A JPEG codec DMA controller Memory controller ISA bus interface UART LCD ctrl Display ctrl Multiplier/Accum Digital camera chip lens CCD Single-functioned -- always a digital camera Tightly-constrained -- Low cost, low power, small, fast Reactive and real-time -- only to a small extent

8 Design challenge – optimizing design metrics
Obvious design goal: Construct an implementation with desired functionality Key design challenge: Simultaneously optimize numerous design metrics Design metric A measurable feature of a system’s implementation Optimizing design metrics is a key challenge

9 Design challenge – optimizing design metrics
Common metrics Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system Size: the physical space required by the system Performance: the execution time or throughput of the system Power: the amount of power consumed by the system Flexibility: the ability to change the functionality of the system without incurring heavy NRE cost

10 Design challenge – optimizing design metrics
Common metrics (continued) Time-to-prototype: the time needed to build a working version of the system Time-to-market: the time required to develop a system to the point that it can be released and sold to customers Maintainability: the ability to modify the system after its initial release Correctness, safety, many more

11 Design metric competition -- improving one may worsen others
Expertise with both software and hardware is needed to optimize design metrics Not just a hardware or software expert, as is common A designer must be comfortable with various technologies in order to choose the best for a given application and constraints Size Performance Power NRE cost Microcontroller CCD preprocessor Pixel coprocessor A2D D2A JPEG codec DMA controller Memory controller ISA bus interface UART LCD ctrl Display ctrl Multiplier/Accum Digital camera chip lens CCD Hardware Software

12 Time-to-market: a demanding design metric
Time required to develop a product to the point it can be sold to customers Market window Period during which the product would have highest sales Average time-to-market constraint is about 8 months Delays can be costly Revenues ($) Time (months)

13 Losses due to delayed market entry
Simplified revenue model Product life = 2W, peak at W Time of market entry defines a triangle, representing market penetration Triangle area equals revenue Loss The difference between the on-time and delayed triangle areas On-time Delayed entry entry Peak revenue Peak revenue from delayed entry Market rise Market fall W 2W Time D On-time Delayed Revenues ($)

14 Losses due to delayed market entry (cont.)
Area = 1/2 * base * height On-time = 1/2 * 2W * W Delayed = 1/2 * (W-D+W)*(W-D) Percentage revenue loss = (D(3W-D)/2W2)*100% Try some examples On-time Delayed entry entry Peak revenue Peak revenue from delayed entry Market rise Market fall W 2W Time D On-time Delayed Revenues ($) Lifetime 2W=52 wks, delay D=4 wks (4*(3*26 –4)/2*26^2) = 22% Lifetime 2W=52 wks, delay D=10 wks (10*(3*26 –10)/2*26^2) = 50% Delays are costly!

15 NRE and unit cost metrics
Costs: Unit cost: the monetary cost of manufacturing each copy of the system, excluding NRE cost NRE cost (Non-Recurring Engineering cost): The one-time monetary cost of designing the system total cost = NRE cost unit cost * # of units per-product cost = total cost / # of units = (NRE cost / # of units) + unit cost Example NRE=$2000, unit=$100 For 10 units total cost = $ *$100 = $3000 per-product cost = $2000/10 + $100 = $300 Amortizing NRE cost over the units results in an additional $200 per unit

16 NRE and unit cost metrics
Compare technologies by costs -- best depends on quantity Technology A: NRE=$2,000, unit=$100 Technology B: NRE=$30,000, unit=$30 Technology C: NRE=$100,000, unit=$2 But, must also consider time-to-market

17 The performance design metric
Widely-used measure of system, widely-abused Clock frequency, instructions per second – not good measures Digital camera example – a user cares about how fast it processes images, not clock speed or instructions per second Latency (response time) Time between task start and end e.g., Camera’s A and B process images in 0.25 seconds Throughput Tasks per second, e.g. Camera A processes 4 images per second Throughput can be more than latency seems to imply due to concurrency, e.g. Camera B may process 8 images per second (by capturing a new image while previous image is being stored). Speedup of B over S = B’s performance / A’s performance Throughput speedup = 8/4 = 2

18 Three key embedded system technologies
Technology A manner of accomplishing a task, especially using technical processes, methods, or knowledge Three key technologies for embedded systems Processor technology IC technology Design technology

19 Application-specific
Processor technology The architecture of the computation engine used to implement a system’s desired functionality Processor does not have to be programmable “Processor” not equal to general-purpose processor Controller Datapath Controller Datapath Controller Datapath Control logic and State register Register file Control logic and State register Registers Control logic index total Custom ALU State register + General ALU IR PC IR PC Data memory Data memory Program memory Data memory Program memory Assembly code for: total = 0 for i =1 to … Assembly code for: total = 0 for i =1 to … General-purpose (“software”) Application-specific Single-purpose (“hardware”)

20 Processor technology Processors vary in their customization for the problem at hand total = 0 for i = 1 to N loop total += M[i] end loop Desired functionality General-purpose processor Application-specific processor Single-purpose processor

21 General-purpose processors
Programmable device used in a variety of applications Also known as “microprocessor” Features Program memory General datapath with large register file and general ALU User benefits Low time-to-market and NRE costs High flexibility “Pentium” the most well-known, but there are hundreds of others IR PC Register file General ALU Datapath Controller Program memory Assembly code for: total = 0 for i =1 to … Control logic and State register Data memory

22 Single-purpose processors
Digital circuit designed to execute exactly one program a.k.a. coprocessor, accelerator or peripheral Features Contains only the components needed to execute a single program No program memory Benefits Fast Low power Small size Datapath Controller Control logic State register Data memory index total +

23 Application-specific processors
Programmable processor optimized for a particular class of applications having common characteristics Compromise between general-purpose and single-purpose processors Features Program memory Optimized datapath Special functional units Benefits Some flexibility, good performance, size and power Controller Datapath Control logic and State register Registers Custom ALU IR PC Data memory Program memory Assembly code for: total = 0 for i =1 to …

24 IC technology The manner in which a digital (gate-level) implementation is mapped onto an IC IC: Integrated circuit, or “chip” IC technologies differ in their customization to a design IC’s consist of numerous layers (perhaps 10 or more) IC technologies differ with respect to who builds each layer and when source drain channel oxide gate Silicon substrate IC package IC

25 IC technology Three types of IC technologies Full-custom/VLSI
Semi-custom ASIC (gate array and standard cell) PLD (Programmable Logic Device)

26 Full-custom/VLSI All layers are optimized for an embedded system’s particular digital implementation Placing transistors Sizing transistors Routing wires Benefits Excellent performance, small size, low power Drawbacks High NRE cost (e.g., $300k), long time-to-market

27 Semi-custom Lower layers are fully or partially built Benefits
Designers are left with routing of wires and maybe placing some blocks Benefits Good performance, good size, less NRE cost than a full-custom implementation (perhaps $10k to $100k) Drawbacks Still require weeks to months to develop

28 PLD (Programmable Logic Device)
All layers already exist Designers can purchase an IC Connections on the IC are either created or destroyed to implement desired functionality Field-Programmable Gate Array (FPGA) very popular Benefits Low NRE costs, almost instant IC availability Drawbacks Bigger, expensive (perhaps $30 per unit), power hungry, slower

29 Moore’s law The most important trend in embedded systems
Predicted in 1965 by Intel co-founder Gordon Moore IC transistor capacity has doubled roughly every 18 months for the past several decades 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Note: logarithmic scale

30 Moore’s law Wow This growth rate is hard to imagine, most people underestimate How many ancestors do you have from 20 generations ago i.e., roughly how many people alive in the 1500’s did it take to make you? 220 = more than 1 million people (This underestimation is the key to pyramid schemes!)

31 Graphical illustration of Moore’s law
1981 1984 1987 1990 1993 1996 1999 2002 10,000 transistors 150,000,000 transistors Leading edge chip in 1981 Leading edge chip in 2002 Something that doubles frequently grows more quickly than most people realize! A 2002 chip can hold about 15, chips inside itself

32 Design Technology The manner in which we convert our concept of desired system functionality into an implementation Libraries/IP: Incorporates pre-designed implementation from lower abstraction level into higher level. System specification Behavioral RT Logic To final implementation Compilation/Synthesis: Automates exploration and insertion of implementation details for lower level. Test/Verification: Ensures correct functionality at each level, thus reducing costly iterations between levels. Compilation/ Synthesis Libraries/ IP Test/ Verification synthesis Behavior Hw/Sw/ OS Cores components Gates/ Cells Model simulat./ checkers Hw-Sw cosimulators HDL simulators Gate simulators

33 Design productivity exponential increase
100,000 10,000 1,000 100 (K) Trans./Staff – Mo. Productivity 10 1 0.1 0.01 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 Exponential increase over the past few decades

34 The co-design ladder In the past: Hardware/software “codesign”
Hardware and software design technologies were very different Recent maturation of synthesis enables a unified view of hardware and software Hardware/software “codesign” Implementation Assembly instructions Machine instructions Register transfers Compilers (1960's,1970's) Assemblers, linkers (1950's, 1960's) Behavioral synthesis (1990's) RT synthesis (1980's, 1990's) Logic synthesis (1970's, 1980's) Microprocessor plus program bits: “software” VLSI, ASIC, or PLD implementation: “hardware” Logic gates Logic equations / FSM's Sequential program code (e.g., C, VHDL) The choice of hardware versus software for a particular function is simply a tradeoff among various design metrics, like performance, power, size, NRE cost, and especially flexibility; there is no fundamental difference between what hardware or software can implement.

35 Independence of processor and IC technologies
Basic tradeoff General vs. custom With respect to processor technology or IC technology The two technologies are independent General-purpose processor ASIP Single- purpose Semi-custom PLD Full-custom General, providing improved: Customized, Power efficiency Performance Size Cost (high volume) Flexibility Maintainability NRE cost Time- to-prototype Time-to-market Cost (low volume)

36 Design productivity gap
While designer productivity has grown at an impressive rate over the past decades, the rate of improvement has not kept pace with chip capacity 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 100,000 1000 Productivity (K) Trans./Staff-Mo. 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 IC capacity productivity Gap

37 Design productivity gap
1981 leading edge chip required 100 designer months 10,000 transistors / 100 transistors/month 2002 leading edge chip requires 30,000 designer months 150,000,000 / transistors/month Designer cost increase from $1M to $300M 10,000 1,000 100 10 1 0.1 0.01 0.001 Logic transistors per chip (in millions) 100,000 1000 Productivity (K) Trans./Staff-Mo. 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009 IC capacity productivity Gap

38 The mythical man-month
The situation is even worse than the productivity gap indicates In theory, adding designers to team reduces project completion time In reality, productivity per designer decreases due to complexities of team management and communication In the software community, known as “the mythical man-month” (Brooks 1975) At some point, can actually lengthen project completion time! (“Too many cooks”) 10 20 30 40 10000 20000 30000 40000 50000 60000 43 24 19 16 15 18 23 Team Individual Months until completion Number of designers 1M transistors, 1 designer=5000 trans/month Each additional designer reduces for 100 trans/month So 2 designers produce 4900 trans/month each

39 Summary Embedded systems are everywhere
Key challenge: optimization of design metrics Design metrics compete with one another A unified view of hardware and software is necessary to improve productivity Three key technologies Processor: general-purpose, application-specific, single-purpose IC: Full-custom, semi-custom, PLD Design: Compilation/synthesis, libraries/IP, test/verification

40 Pr. 1-28

41

42

43 Chapter 2: Custom single-purpose processors

44 Outline Introduction Combinational logic Sequential logic
Custom single-purpose processor design RT-level custom single-purpose processor design

45 Introduction Processor A custom single-purpose processor may be
Digital circuit that performs a computation tasks Controller and datapath General-purpose: variety of computation tasks Single-purpose: one particular computation task Custom single-purpose: non-standard task A custom single-purpose processor may be Fast, small, low power But, high NRE, longer time-to-market, less flexible Microcontroller CCD preprocessor Pixel coprocessor A2D D2A JPEG codec DMA controller Memory controller ISA bus interface UART LCD ctrl Display ctrl Multiplier/Accum Digital camera chip lens CCD

46 CMOS transistor on silicon
The basic electrical component in digital systems Acts as an on/off switch Voltage at “gate” controls whether current flows from source to drain Don’t confuse this “gate” with a logic gate gate source drain Conducts if gate=1 source drain oxide gate IC package IC channel Silicon substrate 1

47 CMOS transistor implementations
Complementary Metal Oxide Semiconductor We refer to logic levels Typically 0 is 0V, 1 is 5V Two basic CMOS types nMOS conducts if gate=1 pMOS conducts if gate=0 Hence “complementary” Basic gates Inverter, NAND, NOR gate source drain nMOS Conducts if gate=1 gate source drain pMOS Conducts if gate=0 x F = x' 1 inverter F = (xy)' x 1 y NAND gate 1 F = (x+y)' x y NOR gate

48

49 Basic logic gates x F 1 x y F 1 x y F 1 x y F 1 F = x Driver F = x y
1 F x y x y F 1 F y x x y F 1 x y F x y F 1 F = x Driver F = x y AND F = x + y OR F = x  y XOR x F x F 1 x y F x y F 1 x y F x y F 1 x y F x y F 1 F = x’ Inverter F = (x y)’ NAND F = (x+y)’ NOR F = x y XNOR

50 Combinational logic design
A) Problem description y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1. B) Truth table 1 Inputs a b c Outputs y z C) Output equations y = a'bc + ab'c' + ab'c + abc' + abc z = a'b'c + a'bc' + ab'c + abc' + abc D) Minimized output equations 00 1 01 11 10 a bc y y = a + bc z z = ab + b’c + bc’ E) Logic Gates a b c y z

51 Combinational components
With enable input e  all O’s are 0 if e=0 With carry-in input Ci sum = A + B + Ci May have status outputs carry, zero, etc. O = I0 if S=0..00 I1 if S=0..01 I(m-1) if S=1..11 O0 =1 if I=0..00 O1 =1 if I=0..01 O(n-1) =1 if I=1..11 sum = A+B (first n bits) carry = (n+1)’th bit of A+B less = 1 if A<B equal =1 if A=B greater=1 if A>B O = A op B op determined by S. n-bit, m x 1 Multiplexor O S0 S(log m) n I(m-1) I1 I0 log n x n Decoder O1 O0 O(n-1) I(log n -1) n-bit Adder A B sum carry Comparator less equal greater n bit, m function ALU

52 Sequential components
clear n-bit Register n load I Q shift I Q n-bit Shift register n-bit Counter n Q Q = 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise. Q = lsb - Content shifted - I stored in msb Q = 0 if clear=1, Q(prev)+1 if count=1 and clock=1.

53 Sequential logic design
A) Problem Description You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles C) Implementation Model Combinational logic State register a x I0 I1 Q1 Q0 D) State Table (Moore-type) 1 Inputs Q1 Q0 a Outputs I1 I0 x 1 2 3 x=0 x=1 a=1 a=0 B) State Diagram Given this implementation model Sequential logic design quickly reduces to combinational logic design

54 Sequential logic design (cont.)
1 Q1Q0 I1 I1 = Q1’Q0a + Q1a’ + Q1Q0’ 00 11 10 a 01 I0 I0 = Q0a’ + Q0’a x = Q1Q0 x E) Minimized Output Equations F) Combinational Logic a Q1 Q0 I0 I1 x

55 Custom single-purpose processor basic model
a view inside the controller and datapath controller datapath state register next-state and control logic registers functional units controller and datapath controller datapath external control inputs control outputs data outputs

56 Example: greatest common divisor
y = y -x 7: x = x - y 8: 6-J: x!=y 5: !(x!=y) x<y !(x<y) 6: 5-J: 1: 1 !1 x = x_i 3: y = y_i 4: 2: 2-J: !go_i !(!go_i) d_o = x 1-J: 9: First create algorithm Convert algorithm to “complex” state machine Known as FSMD: finite-state machine with datapath Can use templates to perform such conversion GCD (a) black-box view x_i y_i d_o go_i (c) state diagram (b) desired functionality 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x;

57 State diagram templates
Assignment statement a = b next statement Loop statement while (cond) { loop-body- statements } next statement cond !cond J: C: Branch statement if (c1) c1 stmts else if c2 c2 stmts else other stmts next statement c1 !c1*c2 !c1*!c2 others J: C:

58 Creating the datapath Create a register for any declared variable
Create a functional unit for each arithmetic operation Connect the ports, registers and functional units Based on reads and writes Use multiplexors for multiple sources Create unique identifier for each datapath component control input and output y = y -x 7: x = x - y 8: 6-J: x!=y 5: !(x!=y) x<y !(x<y) 6: 5-J: 1: 1 !1 x = x_i 3: y = y_i 4: 2: 2-J: !go_i !(!go_i) d_o = x 1-J: 9: subtractor 7: y-x 8: x-y 5: x!=y 6: x<y x_i y_i d_o 0: x 0: y 9: d n-bit 2x1 x_sel y_sel x_ld y_ld x_neq_y x_lt_y d_ld < != Datapath

59 Creating the controller’s FSM
y = y -x 7: x = x - y 8: 6-J: x!=y 5: !(x!=y) x<y !(x<y) 6: 5-J: 1: 1 !1 x = x_i 3: y = y_i 4: 2: 2-J: !go_i !(!go_i) d_o = x 1-J: 9: y_sel = 1 y_ld = 1 7: x_sel = 1 x_ld = 1 8: 6-J: x_neq_y 5: !x_neq_y x_lt_y !x_lt_y 6: 5-J: d_ld = 1 1-J: 9: x_sel = 0 3: y_sel = 0 4: 1: 1 !1 2: 2-J: !go_i !(!go_i) go_i 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 Controller Same structure as FSMD Replace complex actions/conditions with datapath configurations subtractor 7: y-x 8: x-y 5: x!=y 6: x<y x_i y_i d_o 0: x 0: y 9: d n-bit 2x1 x_sel y_sel x_ld y_ld x_neq_y x_lt_y d_ld < != Datapath

60 Splitting into a controller and datapath
go_i Controller implementation model y_sel x_sel Combinational logic Q3 Q0 State register go_i x_neq_y x_lt_y x_ld y_ld d_ld Q2 Q1 I3 I0 I2 I1 Controller !1 0000 1: subtractor 7: y-x 8: x-y 5: x!=y 6: x<y x_i y_i d_o 0: x 0: y 9: d n-bit 2x1 x_sel y_sel x_ld y_ld x_neq_y x_lt_y d_ld < != (b) Datapath 1 !(!go_i) 0001 2: !go_i 0010 2-J: x_sel = 0 x_ld = 1 0011 3: y_sel = 0 y_ld = 1 0100 4: x_neq_y=0 0101 5: x_neq_y=1 0110 6: x_lt_y=1 x_lt_y=0 7: y_sel = 1 y_ld = 1 8: x_sel = 1 x_ld = 1 0111 1000 1001 6-J: 1010 5-J: 1011 9: d_ld = 1 1100 1-J:

61 Controller state table for the GCD example
Inputs Outputs Q3 Q2 Q1 Q0 x_neq_y x_lt_y go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld * 1 X

62 Completing the GCD custom single-purpose processor design
We finished the datapath We have a state table for the next state and control logic All that’s left is combinational logic design This is not an optimized design, but we see the basic steps a view inside the controller and datapath controller datapath state register next-state and control logic registers functional units

63 RT-level custom single-purpose processor design
We often start with a state machine Rather than algorithm Cycle timing often too central to functionality Example Bus bridge that converts 4-bit bus to 8-bit bus Start with FSMD Known as register-transfer (RT) level Exercise: complete the design Problem Specification Bridge A single-purpose processor that converts two 4-bit inputs, arriving one at a time over data_in along with a rdy_in pulse, into one 8-bit output on data_out along with a rdy_out pulse. Sender data_in(4) rdy_in rdy_out data_out(8) Receiver clock FSMD WaitFirst4 RecFirst4Start data_lo=data_in WaitSecond4 rdy_in=1 rdy_in=0 RecFirst4End RecSecond4Start data_hi=data_in RecSecond4End Send8Start data_out=data_hi & data_lo rdy_out=1 Send8End rdy_out=0 Bridge Inputs rdy_in: bit; data_in: bit[4]; Outputs rdy_out: bit; data_out:bit[8] Variables data_lo, data_hi: bit[4];

64 RT-level custom single-purpose processor design (cont’)
Bridge (a) Controller WaitFirst4 RecFirst4Start data_lo_ld=1 WaitSecond4 rdy_in=1 rdy_in=0 RecFirst4End RecSecond4Start data_hi_ld=1 RecSecond4End Send8Start data_out_ld=1 rdy_out=1 Send8End rdy_out=0 rdy_in rdy_out clk data_in(4) data_out to all registers data_hi data_lo data_out_ld data_hi_ld data_lo_ld data_out (b) Datapath

65 Optimizing single-purpose processors
Optimization is the task of making design metric values the best possible Optimization opportunities original program FSMD datapath FSM

66 Optimizing the original program
Analyze program attributes and look for areas of possible improvement number of computations size of variable time and space complexity operations used multiplication and division very expensive

67 Optimizing the original program (cont’)
optimized program 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; 0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; 9: while (y != 0) { 10: r = x % y; 11: x = y; 12: y = r; 13: d_o = x; replace the subtraction operation(s) with modulo operation in order to speed up program GCD(42, 8) - 9 iterations to complete the loop x and y values evaluated as follows : (42, 8), (43, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). GCD(42,8) - 3 iterations to complete the loop x and y values evaluated as follows: (42, 8), (8,2), (2,0)

68 Optimizing the FSMD Areas of possible improvements merge states
states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size scheduling

69 Optimizing the FSMD (cont.)
int x, y; !1 original FSMD optimized FSMD 1: int x, y; 1 !(!go_i) eliminate state 1 – transitions have constant values 2: 2: !go_i go_i !go_i 2-J: x = x_i y = y_i 3: merge state 2 and state 2J – no loop operation in between them 3: x = x_i 5: 4: y = y_i merge state 3 and state 4 – assignment operations are independent of one another x<y x>y 7: y = y -x 8: x = x - y 5: !(x!=y) x!=y merge state 5 and state 6 – transitions from state 6 can be done in state 5 9: d_o = x 6: x<y !(x<y) y = y -x 7: 8: x = x - y eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively 6-J: 5-J: eliminate state 1-J – transition from state 1-J can be done directly from state 9 d_o = x 9: 1-J:

70 Optimizing the datapath
Sharing of functional units one-to-one mapping, as done previously, is not necessary if same operation occurs in different states, they can share a single functional unit Multi-functional units ALUs support a variety of operations, it can be shared among operations occurring in different states

71 Optimizing the FSM State encoding State minimization
task of assigning a unique bit pattern to each state in an FSM size of state register and combinational logic vary can be treated as an ordering problem State minimization task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state

72 Summary Custom single-purpose processors
Straightforward design techniques Can be built to execute algorithms Typically start with FSMD CAD tools can be of great assistance

73 Chapter 3 General-Purpose Processors: Software

74 Introduction General-Purpose Processor
Processor designed for a variety of computation tasks Low unit cost, in part because manufacturer spreads NRE over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone Carefully designed since higher NRE is acceptable Can yield good performance, size and power Low NRE cost, short time-to-market/prototype, high flexibility User just writes software; no processor design a.k.a. “microprocessor” – “micro” used when they were implemented on one or a few chips rather than entire rooms

75 Basic Architecture Control unit and datapath Key differences
Note similarity to single-purpose processor Key differences Datapath is general Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

76 Datapath Operations Load ALU operation Store ... ...
Read memory location into register Processor Control unit Datapath ALU ALU operation Input certain registers through ALU, store back in register Controller +1 Control /Status Registers Store Write register to memory location 10 11 PC IR I/O ... Memory 10 11 ...

77 Control Unit ... Control unit: configures the datapath operations
Sequence of desired operations (“instructions”) stored in memory – “program” Instruction cycle – broken into several sub-operations, each one clock cycle, e.g.: Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to datapath register Execute: Move data through the ALU Store results: Write data from register to memory Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1

78 Control Unit Sub-Operations
Fetch Get next instruction into IR PC: program counter, always points to next instruction IR: holds the fetched instruction Processor Control unit Datapath ALU Controller Control /Status Registers PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

79 Control Unit Sub-Operations
Decode Determine what the instruction means Processor Control unit Datapath ALU Controller Control /Status Registers PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

80 Control Unit Sub-Operations
Fetch operands Move data from memory to datapath register Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

81 Control Unit Sub-Operations
Execute Move data through the ALU This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

82 Control Unit Sub-Operations
Store results Write data from register to memory This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC 100 IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

83 Instruction Cycles ... PC=100 Fetch ops Store results Fetch Decode
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 10 Fetch ops Store results Fetch load R0, M[500] Decode Exec. clk 100

84 Instruction Cycles ... PC=100 PC=101 Fetch ops Store results Fetch
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 Fetch ops Store results Fetch Decode Exec. clk +1 11 Exec. PC=101 Fetch ops Store results inc R1, R0 Fetch Decode clk 10 101

85 Instruction Cycles ... PC=100 PC=101 PC=102 Fetch ops Store results
Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1 Fetch ops Store results Fetch Decode Exec. clk PC=101 Fetch ops Store results Fetch Decode Exec. Decode clk 10 11 11 Store results 102 store M[501], R1 Fetch PC=102 Fetch ops Exec. clk

86 Architectural Considerations
N-bit processor N-bit ALU, registers, buses, memory data interface Embedded: 8-bit, 16-bit, 32-bit common Desktop/servers: 32-bit, even 64 PC size determines address space Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

87 Architectural Considerations
Clock frequency Inverse of clock period Must be longer than longest register to register delay in entire processor Memory access is often the longest Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status

88 Pipelining: Increasing Instruction Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Non-pipelined Pipelined Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 non-pipelined dish cleaning Time pipelined dish cleaning Time Fetch-instr. 1 2 3 4 5 6 7 8 Decode 1 2 3 4 5 6 7 8 Fetch ops. 1 2 3 4 5 6 7 8 Pipelined Execute 1 2 3 4 5 6 7 8 Instruction 1 Store res. 1 2 3 4 5 6 7 8 Time pipelined instruction execution

89 Superscalar and VLIW Architectures
Performance can be improved by: Faster clock (but there’s a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible May require extensive hardware to detect independent instructions VLIW: each word in memory has multiple independent instructions Relies on the compiler to detect and schedule instructions Currently growing in popularity

90 Two Memory Architectures
Processor Program memory Data memory Memory (program and data) Harvard Princeton Princeton Fewer memory wires Harvard Simultaneous program and data memory access

91 Cache Memory Memory access may be slow
Cache is small but fast memory close to processor Holds copy of part of memory Hits and misses Processor Memory Cache Fast/expensive technology, usually on the same chip Slower/cheaper technology, usually on a different chip

92 Programmer’s View Programmer doesn’t need detailed understanding of architecture Instead, needs to know what instructions can be executed Two levels of instructions: Assembly level Structured languages (C, C++, Java, etc.) Most development today done using structured languages But, some assembly level programming may still be necessary Drivers: portion of program that communicates with and/or controls (drives) another device Often have detailed timing considerations, extensive bit manipulation Assembly level may be best for these

93 Assembly-Level Instructions
opcode operand1 operand2 ... Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction Set Defines the legal set of instructions for that processor Data transfer: memory/register, register/register, I/O, etc. Arithmetic/logical: move register through ALU and back Branches: determine next PC value when not just PC+1

94 A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation MOV Rn, direct 0000 Rn direct Rn = M(direct) MOV direct, Rn 0001 Rn direct M(direct) = Rn Rm 0010 Rn Rm M(Rn) = Rm MOV Rn, #immed. 0011 Rn immediate Rn = immediate ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm JZ Rn, relative 0110 Rn relative PC = PC+ relative (only if Rn is 0) opcode operands

95 Addressing Modes Data Immediate Register-direct Register indirect
Operand field Register address Memory address Addressing mode Register-file contents Memory

96 Sample Programs Try some others
int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions... C program MOV R0, #0; // total = 0 MOV R1, #10; // i = 10 JZ R1, Next; // Done if i=0 ADD R0, R1; // total += i MOV R2, #1; // constant 1 JZ R3, Loop; // Jump always Loop: Next: // next instructions... SUB R1, R2; // i-- Equivalent assembly program MOV R3, #0; // constant 0 1 2 3 5 6 7 Try some others Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait until M[254] is 0, set M[255] to 0 (assume those locations are ports). (Harder) Count the occurrences of zero in an array stored in memory locations 100 through 199.

97 Programmer Considerations
Program and data memory space Embedded processors often very limited e.g., 64 Kbytes program, 256 bytes of RAM (expandable) Registers: How many are there? Only a direct concern for assembly-level programmers I/O How communicate with external signals? Interrupts

98 Microprocessor Architecture Overview
If you are using a particular microprocessor, now is a good time to review its architecture

99 Example: parallel port driver
LPT Connection Pin I/O Direction Register Address 1 Output 0th bit of register #2 2-9 14,16,17 1,2,3th bit of register #2 10,11,12,13,15 Input 6,7,5,4,3th bit of register #1 Using assembly language programming we can configure a PC parallel port to perform digital I/O write and read to three special registers to accomplish this table provides list of parallel port connector pins and corresponding register location Example : parallel port monitors the input switch and turns the LED on/off accordingly

100 Parallel Port Example LPT Connection Pin I/O Direction
; This program consists of a sub-routine that reads ; the state of the input pin, determining the on/off state ; of our switch and asserts the output pin, turning the LED ; on/off accordingly .386 CheckPort proc push ax ; save the content push dx ; save the content mov dx, 3BCh + 1 ; base + 1 for register #1 in al, dx ; read register #1 and al, 10h ; mask out all but bit # 4 cmp al, 0 ; is it 0? jne SwitchOn ; if not, we need to turn the LED on SwitchOff: mov dx, 3BCh + 0 ; base + 0 for register #0 in al, dx ; read the current state of the port and al, f7h ; clear first bit (masking) out dx, al ; write it out to the port jmp Done ; we are done SwitchOn: or al, 01h ; set first bit (masking) Done: pop dx ; restore the content pop ax ; restore the content CheckPort endp extern “C” CheckPort(void); // defined in // assembly void main(void) { while( 1 ) { CheckPort(); } LPT Connection Pin I/O Direction Register Address 1 Output 0th bit of register #2 2-9 14,16,17 1,2,3th bit of register #2 10,11,12,13,15 Input 6,7,5,4,3th bit of register #1

101 Operating System Optional software layer providing low-level services to a program (application). File management, disk access Keyboard/display interfacing Scheduling multiple programs for execution Or even just multiple threads from one program Program makes system calls to the OS DB file_name “out.txt” -- store file name MOV R0, system call “open” id MOV R1, file_name address of file-name INT cause a system call JZ R0, L if zero -> error . . . read the file JMP L bypass error cond. L1: . . . handle the error L2:

102 Development Environment
Development processor The processor on which we write and debug our programs Usually a PC Target processor The processor that the program will run on in our embedded system Often different from the development processor Development processor Target processor

103 Software Development Process
Compilers Cross compiler Runs on one processor, but generates code for another Assemblers Linkers Debuggers Profilers Compiler Linker C File Asm. File Binary File Exec. File Assembler Library Implementation Phase Debugger Profiler Verification Phase

104 Running a Program If development processor is different than target, how can we run our compiled code? Two options: Download to target processor Simulate Simulation One method: Hardware description language But slow, not always available Another method: Instruction set simulator (ISS) Runs on development processor, but executes instructions of target processor

105 Instruction Set Simulator For A Simple Processor
#include <stdio.h> typedef struct { unsigned char first_byte, second_byte; } instruction; instruction program[1024]; //instruction memory unsigned char memory[256]; //data memory void run_program(int num_bytes) { int pc = -1; unsigned char reg[16], fb, sb; while( ++pc < (num_bytes / 2) ) { fb = program[pc].first_byte; sb = program[pc].second_byte; switch( fb >> 4 ) { case 0: reg[fb & 0x0f] = memory[sb]; break; case 1: memory[sb] = reg[fb & 0x0f]; break; case 2: memory[reg[fb & 0x0f]] = reg[sb >> 4]; break; case 3: reg[fb & 0x0f] = sb; break; case 4: reg[fb & 0x0f] += reg[sb >> 4]; break; case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break; case 6: pc += sb; break; default: return –1; } return 0; int main(int argc, char *argv[]) { FILE* ifs; If( argc != 2 || (ifs = fopen(argv[1], “rb”) == NULL ) { return –1; if (run_program(fread(program, sizeof(program) == 0) { print_memory_contents(); return(0); else return(-1);

106 Development processor
Testing and Debugging ISS Gives us control over time – set breakpoints, look at register values, set values, step-by-step execution, ... But, doesn’t interact with real environment Download to board Use device programmer Runs in real environment, but not controllable Compromise: emulator Runs in real environment, at speed or near Supports some controllability from the PC Implementation Phase Verification Phase Verification Phase Emulator Debugger/ ISS Programmer Development processor (a) (b) External tools

107 Application-Specific Instruction-Set Processors (ASIPs)
General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processor has high NRE, not programmable ASIPs – targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable

108 A Common ASIP: Microcontroller
For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip peripherals Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chip’s pins Specialized instructions for bit-manipulation and other low-level operations

109 Another Common ASIP: Digital Signal Processors (DSP)
For signal processing applications Large amounts of digitized data, often streaming Data transformations must be applied fast e.g., cell-phone voice filter, digital TV, music synthesizer DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations – e.g., add two arrays Vector ALUs, loop buffers, etc.

110 Trend: Even More Customized ASIPs
In the past, microprocessors were acquired as chips Today, we increasingly acquire a processor as Intellectual Property (IP) e.g., synthesizable VHDL model Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions Can have significant performance, power and size impacts Problem: need compiler/debugger for customized ASIP Remember, most development uses structured languages One solution: automatic compiler/debugger generation e.g., Another solution: retargettable compilers e.g., (customized VLIW architectures)

111 Selecting a Microprocessor
Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processor’s speed? Clock speed – but instructions per cycle may differ Instructions per second – but work per instr. may differ Dhrystone: Synthetic benchmark, developed in Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC – EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

112 General Purpose Processors
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

113 Designing a General Purpose Processor
Reset Fetch Decode IR=M[PC]; PC=PC+1 Mov1 RF[rn] = M[dir] Mov2 Mov3 Mov4 Add Sub Jz 0110 0101 0100 0011 0010 0001 op = 0000 M[dir] = RF[rn] M[rn] = RF[rm] RF[rn]= imm RF[rn] =RF[rn]+RF[rm] RF[rn] = RF[rn]-RF[rm] PC=(RF[rn]=0) ?rel :PC to Fetch PC=0; from states below FSMD Not something an embedded system designer normally would do But instructive to see how simply we can build one top down Remember that real processors aren’t usually built this way Much more optimized, much more bottom-up design Declarations: bit PC[16], IR[16]; bit M[64k][16], RF[16][16]; Aliases: op IR[15..12] rn IR[11..8] rm IR[7..4] dir IR[7..0] imm IR[7..0] rel IR[7..0]

114 Architecture of a Simple Microprocessor
Storage devices for each declared variable register file holds each of the variables Functional units to carry out the FSMD operations One ALU carries out every required operation Connections added among the components’ ports corresponding to the operations required by the FSM Unique identifiers created for every control signal Datapath IR PC Controller (Next-state and control logic; state register) Memory RF (16) RFwa RFwe RFr1a RFr1e RFr2a RFr2e RFr1 RFr2 RFw ALU ALUs 2x1 mux ALUz RFs PCld PCinc PCclr 3x1 mux Ms Mwe Mre To all input control signals From all output control signals Control unit 16 Irld 2 1 A D

115 A Simple Microprocessor
Reset PC=0; PCclr=1; Datapath IR PC Controller (Next-state and control logic; state register) Memory RF (16) RFwa RFwe RFr1a RFr1e RFr2a RFr2e RFr1 RFr2 RFw ALU ALUs 2x1 mux ALUz RFs PCld PCinc PCclr 3x1 mux Ms Mwe Mre To all input control signals From all output control signals Control unit 16 Irld 2 1 A D Fetch IR=M[PC]; PC=PC+1 MS=10; Irld=1; Mre=1; PCinc=1; Decode from states below Mov1 RF[rn] = M[dir] RFwa=rn; RFwe=1; RFs=01; Ms=01; Mre=1; op = 0000 to Fetch Mov2 M[dir] = RF[rn] RFr1a=rn; RFr1e=1; Ms=01; Mwe=1; 0001 to Fetch Mov3 M[rn] = RF[rm] RFr1a=rn; RFr1e=1; Ms=10; Mwe=1; 0010 to Fetch Mov4 RF[rn]= imm RFwa=rn; RFwe=1; RFs=10; 0011 to Fetch Add RF[rn] =RF[rn]+RF[rm] RFwa=rn; RFwe=1; RFs=00; RFr1a=rn; RFr1e=1; RFr2a=rm; RFr2e=1; ALUs=00 0100 to Fetch Sub RF[rn] = RF[rn]-RF[rm] RFwa=rn; RFwe=1; RFs=00; RFr1a=rn; RFr1e=1; RFr2a=rm; RFr2e=1; ALUs=01 0101 to Fetch Jz PC=(RF[rn]=0) ?rel :PC PCld= ALUz; RFrla=rn; RFrle=1; 0110 to Fetch FSM operations that replace the FSMD operations after a datapath is created FSMD You just built a simple microprocessor!

116 Chapter Summary General-purpose processors
Good performance, low NRE, flexible Controller, datapath, and memory Structured languages prevail But some assembly level programming still necessary Many tools available Including instruction-set simulators, and in-circuit emulators ASIPs Microcontrollers, DSPs, network processors, more customized ASIPs Choosing among processors is an important step Designing a general-purpose processor is conceptually the same as designing a single-purpose processor

117 Chapter 4 Standard Single Purpose Processors: Peripherals

118 Introduction Single-purpose processors
Performs specific computation task Custom single-purpose processors Designed by us for a unique task Standard single-purpose processors “Off-the-shelf” -- pre-designed for a common task a.k.a., peripherals serial transmission analog/digital conversions

119 Timers, counters, watchdog timers
Timer: measures time intervals To generate timed output events e.g., hold traffic light green for 10 s To measure input events e.g., measure a car’s speed Based on counting clock pulses E.g., let Clk period be 10 ns And we count 20,000 Clk pulses Then 200 microseconds have passed 16-bit counter would count up to 65,535*10 ns = microsec., resolution = 10 ns Top: indicates top count reached, wrap-around 16-bit up counter Clk Cnt Basic timer Top Reset 16

120 Timer Range Resolution Range = n * t
Maximum time interval. Equals to 2N input intervals. Resolution Minimum input time interval, or the input clock period. Range = n * t Where: 0 < n < 2N, t = clock period or timer resolution.

121 Counters Counter: like a timer, but counts pulses on a general input signal rather than clock e.g., count cars passing over a sensor Can often configure device as either a timer or counter 16-bit up counter Clk 16 Cnt_in 2x1 mux Mode Timer/counter Top Reset Cnt

122 Timers – Examples A PIC 16F84 clock runs at MHz, with an 8-bit timer, what frequencies can be obtained with TOP (overflow) interrupts? fint=f/2N fint= 4/28 fint=4.096 KHz A 16-bit counter, using auto-reload, with f=1.2 MHz is used to generate an interrupt at every 500 s. What should the reload value be? Tint=(2N – R)/T 500 s = (2N – R)/f R = (216 – 1.2*106*500*10-6) R = – 600 R = 64936

123 Other timer structures
Interval timer Indicates when desired time interval has passed We set terminal count to desired interval Number of clock cycles = Desired time interval / Clock period Cascaded counters Prescaler Divides clock Increases range, decreases resolution 16-bit up counter Clk 16 Cnt2 Top1 16/32-bit timer Cnt1 16-bit up counter Clk 16 Terminal count = Top Reset Timer with a terminal count Cnt Top2 Time with prescaler 16-bit up counter Clk Prescaler Mode

124 Input Capture Timer When an external signal is asserted, the timer’s values is recorded.

125 Example: Reaction Timer
indicator light reaction button time: 100 ms LCD /* main.c */ #define MS_INIT void main(void){ int count_milliseconds = 0; configure timer mode set Cnt to MS_INIT wait a random amount of time turn on indicator light start timer while (user has not pushed reaction button){ if(Top) { stop timer reset Top count_milliseconds++; } turn light off printf(“time: %i ms“, count_milliseconds); Measure time between turning light on and user pushing button 16-bit timer, clk period is ns, counter increments every 6 cycles Resolution = 6*83.33=0.5 microsec. Range = 65535*0.5 microseconds = milliseconds Want program to count millisec., so initialize counter to – 1000/0.5 = 63535

126 Watchdog timer Must reset timer every X time unit, else timer generates a signal Common use: detect failure, self-reset Another use: timeouts e.g., ATM machine 16-bit timer, 2 microsec. resolution timereg value = 2*(216-1)–X = –X For 2 min., X = 120,000 microsec. scalereg checkreg timereg to system reset or interrupt osc clk prescaler overflow /* main.c */ main(){ wait until card inserted call watchdog_reset_routine while(transaction in progress){ if(button pressed){ perform corresponding action } /* if watchdog_reset_routine not called every < 2 minutes, interrupt_service_routine is called */ watchdog_reset_routine(){ /* checkreg is set so we can load value into timereg. Zero is loaded into scalereg and is loaded into timereg */ checkreg = 1 scalereg = 0 timereg = 11070 } void interrupt_service_routine(){ eject card reset screen

127 WDT

128 WDT

129 WDT

130 Contador de RPM Um MCU tem 2 C/T de 16 bits, prescaler com divisões de 2, 4 ou 8, e recebem um clock estável entre 500 kHz e 8 MHz. Medir a rotação do eixo de um motor que varia entre 100 rpm e 1000 rpm. Um sensor está ligado ao eixo e produz 60 pulsos por ciclo. Cada rotação do motor produz entre 100 Hz e 1000 Hz 60 cpr * 100 rpm = 6000 cpm / 60 spm = 100 cps Contar os pulsos do sensor em cada segundo! Um C/T como temporizador gerando o período de 1 segundo Um C/T como contador de pulsos do sensor dentro de cada segundo Geração de períodos de 1 segundo 1/216 = 15,259 s ou 65,536 kHz (usando o sinal overflow) Usando o prescaler de 8  8 * 65,536 = 524,288 kHz

131 Serial Transmission Using UARTs
UART: Universal Asynchronous Receiver Transmitter Takes parallel data and transmits serially Receives serial data and converts to parallel Parity: extra bit for simple error checking Start bit, stop bit Baud rate signal changes per second bit rate usually higher embedded device 1 Sending UART Receiving UART start bit data end bit

132 UART Sync

133 I2C

134 I2C – PCF8574

135 I2C – PCF8574

136 SPI Serial Paralel Interface

137 SPI Timing

138 LED Must use resistor to limit current. Lei de Ohm: LED = diodo
R = V/I LED = diodo VF e IF VF determinado pela cor IF determina a intensidade Resistor limita IF

139 Displays 7 Segments Dot Dot Matrix

140 7-Segments

141 7-Segments – Multiplexed

142 LCD Nematic type

143 LCD Polarizing

144 LCD Types

145 LCD Color

146 Pulse width modulator Generates pulses with specific high/low times
Duty cycle: % time high Square wave: 50% duty cycle Common use: control average voltage to electric device Simpler than DC-DC converter or digital-analog converter DC motor speed, dimmer lights Another use: encode commands, receiver uses timer to decode clk pwm_o 25% duty cycle – average pwm_o is 1.25V clk pwm_o 50% duty cycle – average pwm_o is 2.5V. clk pwm_o 75% duty cycle – average pwm_o is 3.75V.

147 Controlling a DC motor with a PWM
Internal Structure of PWM clk_div cycle_high counter ( 0 – 254) 8-bit comparator controls how fast the counter increments counter < cycle_high, pwm_o = 1 counter >= cycle_high, pwm_o = 0 pwm_o clk Relationship between applied voltage and speed of the DC Motor void main(void){ /* controls period */ PWMP = 0xff; /* controls duty cycle */ PWM1 = 0x7f; while(1){}; } The PWM alone cannot drive the DC motor, a possible way to implement a driver is shown below using an MJE3055T NPN transistor. DC MOTOR 5V From processor

148 LCD controller void WriteChar(char c){
R/W RS DB7–DB0 LCD controller communications bus microcontroller 8 void WriteChar(char c){ RS = 1; /* indicate data being sent */ DATA_BUS = c; /* send data to LCD */ EnableLCD(45); /* toggle the LCD with appropriate delay */ }

149 Keypad controller N=4, M=4 N1 N2 N3 k_pressed N4 M1 M2 M3 M4 4
key_code keypad controller k_pressed 4 N=4, M=4

150 KBD Key bouncing

151 Key Debouncing RC Network/LP Filter

152 Latch

153 Software Scan column Scan line Filter closed key Time-out interruption

154 Stepper motor controller
Stepper motor: rotates fixed number of degrees when given a “step” signal In contrast, DC motor just rotates when power applied, coasts to stop Rotation achieved by applying specific voltage sequence to coils Controller greatly simplifies this MC3479P 1 5 4 3 2 7 8 6 16 15 14 13 12 11 10 9 Vd A’ A GND Bias’/Set Clk O|C Vm B B’ Phase A’ CW’/CCW Full’/Half Step Red A White A’ Yellow B Black B’

155 Stepper motor with controller (driver)
/* main.c */ sbit clk=P1^1; sbit cw=P1^0; void delay(void){ int i, j; for (i=0; i<1000; i++) for ( j=0; j<50; j++) i = i + 0; } void main(void){ */turn the motor forward */ cw=0; /* set direction */ clk=0; /* pulse clock */ delay(); clk=1; /*turn the motor backwards */ cw=1; /* set direction */ } 2 A’ 3 A 10 7 B 15 B’ 14 MC3479P Stepper Motor Driver 8051 P1.0 P1.1 Stepper Motor CLK CW’/CCW The output pins on the stepper motor driver do not provide enough current to drive the stepper motor. To amplify the current, a buffer is needed. One possible implementation of the buffers is pictured to the left. Q1 is an MJE3055T NPN transistor and Q2 is an MJE2955T PNP transistor. A is connected to the 8051 microcontroller and B is connected to the stepper motor.

156 Stepper motor without controller (driver)
8051 GND/ +V P2.4 P2.3 P2.2 P2.1 P2.0 /*main.c*/ sbit notA=P2^0; sbit isA=P2^1; sbit notB=P2^2; sbit isB=P2^3; sbit dir=P2^4; void delay(){ int a, b; for(a=0; a<5000; a++) for(b=0; b<10000; b++) a=a+0; } void move(int dir, int steps) { int y, z; /* clockwise movement */ if(dir == 1){ for(y=0; y<=steps; y++){ for(z=0; z<=19; z+4){ isA=lookup[z]; isB=lookup[z+1]; notA=lookup[z+2]; notB=lookup[z+3]; delay(); /* counter clockwise movement */ if(dir==0){ for(y=0; y<=step; y++){ for(z=19; z>=0; z - 4){ isA=lookup[z]; isB=lookup[z-1]; notA=lookup[z -2]; notB=lookup[z-3]; delay( ); } void main( ){ int z; int lookup[20] = { 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0 }; while(1){ /*move forward, 15 degrees (2 steps) */ move(1, 2); /* move backwards, 7.5 degrees (1step)*/ move(0, 1); A possible way to implement the buffers is located below. The 8051 alone cannot drive the stepper motor, so several transistors were added to increase the current going to the stepper motor. Q1 are MJE3055T NPN transistors and Q3 is an MJE2955T PNP transistor. A is connected to the 8051 microcontroller and B is connected to the stepper motor. Q2 +V 1K Q1 A B 330

157 Analog-to-digital converters
proportionality Vmax = 7.5V 0V 1111 1110 0000 0010 0100 0110 1000 1010 1100 0001 0011 0101 0111 1001 1011 1101 0.5V 1.0V 1.5V 2.0V 2.5V 3.0V 3.5V 4.0V 4.5V 5.0V 5.5V 6.0V 6.5V 7.0V analog to digital 4 3 2 1 t1 t2 t3 t4 0100 1000 0110 0101 time analog input (V) Digital output digital to analog 4 3 2 1 0100 1000 0110 0101 t1 t2 t3 t4 time analog output (V) Digital input

158 Digital-to-analog conversion using successive approximation
Given an analog input signal whose voltage should range from 0 to 15 volts, and an 8-bit digital encoding, calculate the correct encoding for 5 volts. Then trace the successive-approximation approach to find the correct encoding. 5/15 = d/(28-1) d= 85 Encoding: Successive-approximation method ½(Vmax – Vmin) = 7.5 volts Vmax = 7.5 volts. ½( ) = 5.16 volts Vmax = 5.16 volts. 1 ½( ) = 3.75 volts Vmin = 3.75 volts. 1 ½( ) = 4.93 volts Vmin = 4.93 volts. 1 ½( ) = 5.63 volts Vmax = 5.63 volts 1 ½( ) = 5.05 volts Vmax = 5.05 volts. 1 ½( ) = 4.69 volts Vmin = 4.69 volts. 1 ½( ) = 4.99 volts 1

159 R2R Ladder DAC

160 Digital-to-analog conversion
Use resistor tree: R Vout bn 2R bn-1 4R bn-2 8R bn-3

161 Flash A/D conversion N-bit result requires 2n comparators: Vin encoder
...

162 RAMP ADC

163 Dual-slope conversion
Use counter to time required to charge/discharge capacitor. Charging, then discharging eliminates non-linearities. Vin timer

164 SAR ADC Algorithm The successive approximation Analog to digital converter circuit typically consists of four chief subcircuits: A sample and hold circuit to acquire the input voltage (Vin). An analog voltage comparator that compares Vin to the output of the internal DAC and outputs the result of the comparison to the successive approximation register (SAR). A successive approximation register subcircuit designed to supply an approximate digital code of Vin to the internal DAC. An internal reference DAC that supplies the comparator with an analog voltage equivalent of the digital code output of the SAR for comparison with Vin.

165 SAR ADC

166 Distributed-charge SAR ADC
The DAC conversion is performed in four basic steps: First, the capacitor array is completely discharged to the offset voltage of the comparator, VOS. This step provides automatic offset cancellation. Next, all of the capacitors within the array are switched to the input signal, vIN. The capacitors now have a charge equal to their respective capacitance times the offset voltage minus the input voltage upon each of them. In the third step, the capacitors are then switched so that this charge is applied across the comparator's input, creating a comparator input voltage equal to -vIN. Finally, the actual conversion process proceeds. First, the MSB capacitor is switched to VREF, which corresponds to the full-scale range of the ADC. Due to the binary-weighting of the array the MSB capacitor forms a 1:1 divided between it and the rest of the array. Thus, the input voltage to the comparator is now -vIN plus VREF/2. Subsequently, if vIN is greater than VREF/2 then the comparator outputs a digital 1 as the MSB, otherwise it outputs a digital 0 as the MSB. Each capacitor is tested in the same manner until the comparator input voltage converges to the offset voltage, or at least as close as possible given the resolution of the DAC.

167 Sample-and-hold Required in any A/D: converter Vin

168 Understanding analog to digital converter specifications, By Len Staller. Embedded Systems Design, 02/24/05.

169 Chapter 5 Memory

170 Outline Memory Write Ability and Storage Permanence
Common Memory Types Composing Memory Memory Hierarchy and Cache Advanced RAM

171 Introduction Embedded system’s functionality aspects Processing
processors transformation of data Storage memory retention of data Communication buses transfer of data

172 Memory: basic concepts
Stores large number of bits m x n: m words of n bits each k = Log2(m) address input signals or m = 2^k words e.g., 4,096 x 8 memory: 32,768 bits 12 address input signals 8 input/output data signals Memory access r/w: selects read or write enable: read or write only when asserted multiport: multiple accesses to different locations simultaneously m × n memory n bits per word m words enable 2k × n read and write memory A0 r/w Q0 Qn-1 Ak-1 memory external view

173 Write ability/ storage permanence
Traditional ROM/RAM distinctions ROM read only, bits stored without power RAM read and write, lose stored bits without power Traditional distinctions blurred Advanced ROMs can be written to e.g., EEPROM Advanced RAMs can hold bits without power e.g., NVRAM Write ability Manner and speed a memory can be written Storage permanence ability of memory to hold stored bits after they are written Write ability and storage permanence of memories, showing relative degrees along each axis (not to scale). External programmer OR in-system, block-oriented writes, 1,000s of cycles Battery life (10 years) Write ability EPROM Mask-programmed ROM EEPROM FLASH NVRAM SRAM/DRAM Storage permanence Nonvolatile In-system programmable Ideal memory OTP ROM During fabrication only programmer, 1,000s one time only In-system, fast writes, unlimited cycles Near zero Tens of years Life of product

174 Write ability Ranges of write ability In-system programmable memory
High end processor writes to memory simply and quickly e.g., RAM Middle range processor writes to memory, but slower e.g., FLASH, EEPROM Lower range special equipment, “programmer”, must be used to write to memory e.g., EPROM, OTP ROM Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability

175 Storage permanence Range of storage permanence Nonvolatile memory
High end essentially never loses bits e.g., mask-programmed ROM Middle range holds bits days, months, or years after memory’s power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence

176 ROM: “Read-Only” Memory
Nonvolatile memory Can be read from but not written to, by a processor in an embedded system Traditionally written to, “programmed”, before inserting to embedded system Uses Store software program for general-purpose processor program instructions can be one or more ROM words Store constant data needed by system Implement combinational circuit 2k × n ROM Q0 Qn-1 A0 enable Ak-1 External view

177 programmable connection
Example: 8 x 4 ROM Horizontal lines = words Vertical lines = data Lines connected only at circles Decoder sets word 2’s line to 1 if address input is 010 Data lines Q3 and Q1 are set to 1 because there is a “programmed” connection with word 2’s line Word 2 is not connected with data lines Q2 and Q0 Output is 1010 8 × 4 ROM 3×8 decoder Q0 Q3 A0 enable A2 word 0 word 1 A1 Q2 Q1 programmable connection wired-OR word line data line word 2 Internal view

178 Implementing combinational function
Any combinational circuit of n functions of same k variables can be done with 2^k x n ROM Truth table Inputs (address) Outputs a b c y z z y c enable a b 8×2 ROM word 0 word 1 word 7

179 Mask-programmed ROM Connections “programmed” at fabrication
set of masks Lowest write ability only once Highest storage permanence bits never change unless damaged Typically used for final design of high-volume systems spread out NRE cost for a low unit cost

180 OTP ROM: One-time programmable ROM
Connections “programmed” after manufacture by user user provides file of desired contents of ROM file input to machine called ROM programmer each programmable connection is a fuse ROM programmer blows fuses where connections should not exist Very low write ability typically written only once and requires ROM programmer device Very high storage permanence bits don’t change unless reconnected to programmer and more fuses blown Commonly used in final products cheaper, harder to inadvertently modify

181 EPROM: Erasable programmable ROM
Programmable component is a MOS transistor Transistor has “floating” gate surrounded by an insulator (a) Negative charges form a channel between source and drain storing a logic 1 (b) Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0 (c) (Erase) Shining UV rays on surface of floating-gate causes negative charges to return to channel from floating gate restoring the logic 1 (d) An EPROM package showing quartz window through which UV light can pass Better write ability can be erased and reprogrammed thousands of times Reduced storage permanence program lasts about 10 years but is susceptible to radiation and electric noise Typically used during design development (d) (a) (b) source drain +15V 0V (c) floating gate 5-30 min .

182 EEPROM: Electrically erasable programmable ROM
Programmed and erased electronically typically by using higher than normal voltage can program and erase individual words Better write ability can be in-system programmable with built-in circuit to provide higher than normal voltage built-in memory controller commonly used to hide details from memory user writes very slow due to erasing and programming “busy” pin indicates to processor EEPROM still writing can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive

183 Flash Memory Extension of EEPROM Fast erase
Same floating gate principle Same write ability and storage permanence Fast erase Large blocks of memory erased at once, rather than one word at a time Blocks typically several thousand bytes large Writes to single words may be slower Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones

184 RAM: “Random-access” memory
Typically volatile memory bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read enable 2k × n read and write memory A0 r/w Q0 Qn-1 Ak-1 external view 4×4 RAM 2×4 decoder Q0 Q3 A0 enable A1 Q2 Q1 Memory cell I0 I3 I2 I1 rd/wr To every cell internal view

185 Basic types of RAM SRAM: Static RAM DRAM: Dynamic RAM
Memory cell uses flip-flop to store bit Requires 6 transistors Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit More compact than SRAM “Refresh” required due to capacitor leak word’s cells refreshed when read Typical refresh rate microsec. Slower to access than SRAM memory cell internals Data W Data' SRAM Data W DRAM

186 Ram variations PSRAM: Pseudo-static RAM NVRAM: Nonvolatile RAM
DRAM with built-in memory refresh controller Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power turned off

187 Example: HM6264 & 27C256 RAM/ROM devices
Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type RAM: 62 ROM: 27 Subsequent digits indicate capacity in kilobits Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V) HM 27C 22 20 data<7…0> addr<15...0> /OE /WE /CS1 CS2 HM6264 11-13, 15-19 2,23,21,24, 25, 3-10 27 26 /CS 27C256 27,26,2,23,21, 24,25, 3-10 block diagrams device characteristics timing diagrams data addr OE Read operation WE Write operation

188 Example: TC55V2325FF-100 memory device
2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O timing diagram block diagram device characteristics data<31…0> addr<15…0> addr<10...0> /CS1 /CS2 CS3 /WE /OE MODE /ADSP /ADSC /ADV CLK TC55V2325FF-100 Device Access Time (ns) Standby Pwr. (mW) Active Pwr. (mW) Vcc Voltage (V) TC55V na 25FF-100 A single read operation addr <15…0> /CS1 and /CS2

189 Composing memory … 1 × 2 decoder enable A … enable outputs
Memory size needed often differs from size of readily available memories When available memory is larger, simply ignore unneeded high-order address bits and higher data lines When available memory is smaller, compose several smaller memories into one larger memory Connect side-by-side to increase width of words Connect top to bottom to increase number of words added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words 2m+1 × n ROM 2m × n ROM A0 enable Am-1 Am 1 × 2 decoder Qn-1 Q0 Increase number of words 2m × 3n ROM 2m × n ROM A0 enable Q3n-1 Q2n-1 Q0 Am Increase width of words A enable outputs Increase number and width of words

190 Memory hierarchy Want inexpensive, fast memory Main memory Processor
Large, inexpensive, slow memory stores entire program and data Cache Small, expensive, fast memory stores copy of likely accessed parts of larger memory Can be multiple levels of cache Processor Cache Main memory Disk Tape Registers

191 Cache Usually designed with SRAM Usually on same chip as processor
faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access ( 1 cycle vs. several cycles for main memory) Cache operation: Request for main memory access (read or write) First, check cache for copy cache hit copy is in cache, quick access cache miss copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques

192 Cache mapping Far fewer number of available cache addresses
Are address’ contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line

193 Direct mapping Data Valid Tag Index Offset = V T D
Main memory address divided into 2 fields Index cache address number of bits determined by cache size Tag compared with tag stored in cache at address indicated by index if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line Data Valid Tag Index Offset = V T D

194 Fully associative mapping
Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping Tag Offset = V T D Valid Data

195 Set-associative mapping
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common Tag Index Offset = V T D Data Valid

196 Cache-replacement policy
Technique for choosing which block to replace when fully associative cache is full when set-associative cache’s line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue

197 Cache write techniques
When written, data cache must update main memory Write-through write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes Write-back main memory only written when “dirty” block replaced extra dirty bit for each block set when cache block written to reduces number of slow main memory writes

198 Cache impact on system performance
Most important parameters in terms of performance: Total size of cache total number of data bytes cache can hold tag, valid and other house keeping bits not included in total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access cost e.g., 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change avg. cost of memory access = ( * 4) + ( * 20) = cycles (worse)

199 Cache performance trade-offs
Improving cache hit rate without increasing size Increase line size Change set-associativity 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb 1 way 2 way 4 way 8 way % cache miss cache size

200 Advanced RAM DRAMs commonly used as main memory in processor based embedded systems high capacity, low cost Many variations of DRAMs proposed need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM

201 Basic DRAM Data In Buffer Out Buffer Row Addr. Buffer Col Addr
Address bus multiplexed between row and column components Row and column addresses are latched in, sequentially, by strobing ras and cas signals, respectively Refresh circuitry can be external or internal to DRAM device strobes consecutive memory address periodically causing memory content to be refreshed Refresh circuitry disabled during read or write operation Data In Buffer Out Buffer rd/ wr data Row Addr. Buffer Col Addr . Buffer address ras cas Bit storage array Row Decoder Col Decoder Refresh Circuit cas, ras, clock Sense Amplifiers

202 Fast Page Mode DRAM (FPM DRAM)
Each row of memory bit array is viewed as a page Page contains multiple words Individual words addressed by column address Timing diagram: row (page) address sent 3 words read consecutively by sending column address for each Extra cycle eliminated on each read/write of words from same page row col data ras cas address

203 Extended data out DRAM (EDO DRAM)
Improvement of FPM DRAM Extra latch before output buffer allows strobing of cas before data read operation completed Reduces read/write latency by additional cycle row col data Speedup through overlap ras cas address

204 (S)ynchronous and Enhanced Synchronous (ES) DRAM
SDRAM latches data on active edge of clock Eliminates time to detect ras/cas and rd/wr signals A counter is initialized to column address then incremented on active edge of clock to access consecutive memory locations ESDRAM improves SDRAM added buffers enable overlapping of column addressing faster clocking and lower read/write latency possible clock ras cas address data row col

205 Rambus DRAM (RDRAM) More of a bus interface architecture than DRAM architecture Data is latched on both rising and falling edge of clock Broken into 4 banks each with own row decoder can have 4 pages open at a time Capable of very high throughput

206 DRAM integration problem
SRAM easily integrated on same chip as processor DRAM more difficult Different chip making process between DRAM and conventional logic Goal of conventional logic (IC) designers: minimize parasitic capacitance to reduce signal propagation delays and power consumption Goal of DRAM designers: create capacitor cells to retain stored information Integration processes beginning to appear

207 Memory Management Unit (MMU)
Duties of MMU Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors Translates logic memory addresses from processor to physical memory addresses of DRAM Modern CPUs often come with MMU built-in Single-purpose processors can be used


Download ppt "Chapter 1: Introduction"

Similar presentations


Ads by Google