Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Kurt Keutzer Lecture 11: Interfaces, I/O and Configurable Processors Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from.

Similar presentations


Presentation on theme: "1 Kurt Keutzer Lecture 11: Interfaces, I/O and Configurable Processors Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from."— Presentation transcript:

1 1 Kurt Keutzer Lecture 11: Interfaces, I/O and Configurable Processors Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from Prof. David Patterson Niraj Shah, Scott Weber

2 2 Kurt Keutzer Embedded Systems vs. General Purpose Computing - 1 Embedded System Runs a few applications often known at design time Not end-user programmable Operates in fixed run-time constraints, additional performance may not be useful/valuable General purpose computing Intended to run a fully general set of applications End-user programmable Faster is always better

3 3 Kurt Keutzer Embedded Systems vs. General Purpose Computing - 2 Embedded System Differentiating features: power cost speed (must be predictable) General purpose computing Differentiating features speed (need not be fully predictable) speed did we mention speed? cost (largest component power)

4 4 Kurt Keutzer Configurabilty and Embedded Systems Advantages of configuration: Pay (in power, design time, area) only for what you use Gain additional performance by adding features tailored to your application: Particularly for embedded systems: Principally in embedded controller microprocessor applications Some us in DSP

5 5 Kurt Keutzer What to Configure? What parts of the microcontroller/microprocessor system to configure? Easy answers: Memory and Cache Sizes - get precisely the sizes your applications needs Register file sizes Interrupt handling and addresses Harder answers: Peripherals Instructions But first we need more context

6 6 Kurt Keutzer I/O Interrupts An I/O interrupt is just like the exception handlers except: An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution: I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion You can pick your own convenient point to take an interrupt I/O interrupt is more complicated than exception: Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies: Interrupt request needs to be prioritized

7 7 Kurt Keutzer  add $r1,$r2,$r3 subi $r4,$r1,#4 slli $r4,$r4,#2 Hiccup(!) lw$r2,0($r4) lw$r3,4($r4) add$r2,$r2,$r3 sw8($r4),$r2  Raise priority Reenable All Ints Save registers  lw$r1,20($r0) lw$r2,0($r1) addi$r3,$r0,#5 sw $r3,0($r1)  Restore registers Clear current Int Disable All Ints Restore priority RTI External Interrupt PC saved Disable All Ints Supervisor Mode Restore PC User Mode “Interrupt Handler” Example: Device Interrupt Advantage: User program progress is only halted during actual transfer Disadvantage, special hardware is needed to: Cause an interrupt (I/O device) Detect an interrupt (processor) Save the proper states to resume after the interrupt (processor)

8 8 Kurt Keutzer Interrupt Driven Data Transfer CPU IOC device Memory add sub and or nop read store... rti memory user program (1) I/O interrupt (2) save PC (3) interrupt service addr interrupt service routine (4) Device xfer rate = 10 MBytes/sec => 0.1 x 10 sec/byte => 0.1 µsec/byte => 1000 bytes = 100 µsec 1000 transfers x 100 µsecs = 100 ms = 0.1 CPU seconds -6 User program progress only halted during actual transfer 1000 transfers at 1 ms each: µsec per interrupt 1000 interrupt 98 µsec each = 0.1 CPU seconds Still far from device transfer rate! 1/2 in interrupt overhead

9 9 Kurt Keutzer Better Way to Handle Interrupts? Handling all interrupts with CPU could bring it to a halt in a real time system Isn’t there a better way? Hint, remember the trickledown theory of embedded processor architecture.

10 10 Kurt Keutzer Trickle Down Theory of Embedded Architectures Mainframe/supercomputers High-end servers/workstations High-end personal computers Personal computers Lap tops/palm tops Gadgets Watches... Features tend to trickle down: #bits: 4->8->16->32->64 ISA’s Floating point support Dynamic scheduling Caches I/O controllers/processors LIW/VLIW Superscalar

11 11 Kurt Keutzer I/O Interface Independent I/O Bus CPU Interface Peripheral Memory memory bus Separate I/O instructions (in,out) CPU Interface Peripheral Memory Lines distinguish between I/O and memory transfers common memory & I/O bus VME bus Multibus-II Nubus 40 Mbytes/sec optimistically 10 MIP processor completely saturates the bus!

12 12 Kurt Keutzer Delegating I/O Responsibility from the CPU: IOP CPU IOP Mem D1 D2 Dn... main memory bus I/O bus CPU IOP (1) Issues instruction to IOP memory (2) (3) Device to/from memory transfers are controlled by the IOP directly. IOP steals memory cycles. OP Device Address target device where cmnds are IOP looks in memory for commands OP Addr Cnt Other what to do where to put data how much special requests (4) IOP interrupts CPU when done

13 13 Kurt Keutzer Memory Mapped I/O Single Memory & I/O Bus No Separate I/O Instructions CPU Interface Peripheral Memory ROM RAM I/O $ CPU L2 $ Memory Bus MemoryBus Adaptor I/O bus

14 14 Kurt Keutzer Delegating I/O Responsibility from the CPU: DMA Direct Memory Access (DMA): External to the CPU Act as a master on the bus Transfers blocks of data to or from memory without CPU intervention CPU IOC device Memory DMAC CPU sends a starting address, direction, and length count to DMAC. Then issues "start". DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory.

15 15 Kurt Keutzer Direct Memory Access CPU IOC device Memory DMAC Time to do 1000 xfers at 1 msec each: 1 DMA set-up 50 µsec 1 2 µsec 1 interrupt service 48 µsec.0001 second of CPU time CPU sends a starting address, direction, and length count to DMAC. Then issues "start". DMAC provides handshake signals for Peripheral Controller, and Memory Addresses and handshake signals for Memory. 0 ROM RAM Peripherals DMAC n Memory Mapped I/O

16 16 Kurt Keutzer Family 68K was the most successful embedded controller in history CISC instruction set - good code density Table lookup for compressed tables Time processing unit - breakthrough in modular peripheral handling!

17 17 Kurt Keutzer MC Top level inter module bus IMB I/0 - channel 0 I/0 - channel 15 unit TPU time processing CPU32 serial I/0 IMB control RAM TPU Designed for automotive applications with mixture of computation intensive tasks and complex I/0 -functions Idea: off-load CPU from frequent I/0 interactions to make use of computation performance:

18 18 Kurt Keutzer CPU Block Diagram

19 19 Kurt Keutzer Addressing Modes in Seven modes Register direct Register indirect Register indirect with index Program counter indirect with displacement Program counter indirect with Index Absolute Immediate Why so many modes? Antiquated architectural feature?

20 20 Kurt Keutzer Addressing Modes in Seven modes Register direct Register indirect Register indirect with index Program counter indirect with displacement Program counter indirect with Index Absolute Immediate Complex addressing modes allow for more dense code … but … MCore - Mot’s embedded micocontroller rewrite uses simple DLX-like Load Store instructions - code size impact?

21 21 Kurt Keutzer MC68332 Time Processing Unit IMB Data Control Service Requests Microengine Host Interface Timer Channels Scheduler Development Support and Test System Configuration Channel Control Parameter RAM Store Execution Unit Channel 0 Channel 1 Channel 15 Pins Control and Data Channel Control Store time base TPU: time processing unit:peripheral coprocessor independent programmable timer channels: single-shot "capture & compare" channel coupling and sequence control with control processor pin

22 22 Kurt Keutzer Time Processing Unit

23 23 Kurt Keutzer Time Processing Unit Semi-autonomous microcontroller Operates concurrently with CPU Schedules tasks Processes ROM instructions Accesses shared data with CPU Performs Input/Output

24 24 Kurt Keutzer Uses of Time Processing Unit Programmable series of two operations Match Capture Each operation is called an ``event’’ A pre-programmed series of event is called a ``function’’ Pre-programmed functions Input capture/input transition counter Output compare Period measurement with addition/missing transition detect Position synchronized pulse-generator Period/pulse-width accumulator

25 25 Kurt Keutzer Time Bases Two sixteen-bit counters provide time bases for all Pre-scalers controlled by CPU via bit-fiels in TPU module configuration register TPUCMR Current values accessible via TCR1 and TCR2 registers TCR1, TCR2 can be read/written by TPU microcode- not available to CPU TC1 qualified by system clock TC2 qualified by system clock or external clock

26 26 Kurt Keutzer Timer Channels Sixteen channels - each one connect to a MCU pin Each channel has symmetric hardware: Event register 16-bit capture register 16-bit compare/match register 16-bit comparator Pin control logic - pin direction determined by TPU microengine

27 27 Kurt Keutzer Scheduler Determines which of sixteen channels is serviced by the microenginer Channel can request service for one of four reasons host service link to another channel match event capture event Host system assigns to each channel a priority high middle low

28 28 Kurt Keutzer Microengine Determines which of sixteen channels is serviced by the microenginer Channel can request service for one of four reasons host service link to another channel match event capture event Host system assigns to each channel a priority high middle low

29 29 Kurt Keutzer Another Motorola Microprocessor

30 30 Kurt Keutzer Concepts so far... Interrupts Memory Mapping of I/O Time Processing Unit / Peripheral Processor other configurable elements Peripherals Instructions

31 31 Kurt Keutzer Configurability in ARM Processor ARM allows for configurability via AMBA bus Offers ``prime cell’’ peripherals which hook into AMBA Peripheral Bus (APB) UART Real Time Clock Audio Codec Interface Keyboard and mouse interface General purpose I/O Smart card interface Generic IR interface

32 32 Kurt Keutzer ARM7 core

33 33 Kurt Keutzer ARM’s Amba open standard Advanced System Bus, (ASB) - high performance, CPU, DMA, external Advanced Peripheral Bus, (APB) - low speed, low power, parallel I/O, UART’s External interface

34 34 Kurt Keutzer Ex1: ARM Infrared (IR) Interface

35 35 Kurt Keutzer Ex 2: ARM Smart Card Interface

36 36 Kurt Keutzer Ex 3: Audio Codec

37 37 Kurt Keutzer Another Kind of Configurability RTL Synthesis HDL netlist logic optimization netlist Library physical design layout Synthesis of a processor core from an RTL description allows for: full range of other types of configurability additional degrees of freedom in quality of implementation Examples: ARM7 Motorola Coldfire Tensilica Xtensa

38 38 Kurt Keutzer Quality of Results Tradeoffs Delay Area Synthesizable implementation allows for explanation of a wide range of implementations

39 39 Kurt Keutzer ARM Core7 Thumb Embedded

40 40 Kurt Keutzer Ultimate configurabilty :The tensilica solution:

41 41 Kurt Keutzer Tensilica Viterbi Implementation Niraj Shah Scott Weber 290A Final Presentation

42 42 Kurt Keutzer Tensilica Flow.c.o xt-run.c gen uArch Designer gen xt-gcc TIE Tensilica Processor Generator

43 43 Kurt Keutzer Xtensa Architecture Xtensa Core RsRtRrI TIE TIE Extensions: single cycle state free no new exceptions no stalls typeless data Rs, Rt, Rr are 32 bit regs I is the instruction controlling the TIE unit Xtensa Core is a 32 bit configurable RISC processor

44 44 Kurt Keutzer Viterbi Architecture ACS TraceBackRAMInit ADC I/0 Device MeasuredPerformanceHere

45 45 Kurt Keutzer TIE SetupBMreg (ACS) :70 I RsRt Rr 318:70 Q bm3 3123:2415:167:80 bm2bm1bm0 - 0x7F - Control instruction

46 46 Kurt Keutzer ACS TIE Extension (ACS) + + bm3 3124:2316:158:70 bm2bm1bm0 17 pm- 111:027 -=1? 11:12 pm 310:1 0’s decision bit ACS03 || ACS12 || ACS30 || ACS21 31 instruction RtRs Rr msb

47 47 Kurt Keutzer ACS TIE Extension with State (ACS) bm3 3124:2316:158:70 bm2bm1bm pm =1? 31 Rs msb pm =1? 31 Rt msb 11 pm 310:1 decision bit Rr pm 16:17 0:11:0 27 decision bit Control instruction

48 48 Kurt Keutzer TIE Zmask (TraceBack) & 311:0 RsRt Rr 316:50 6:70 | 0x7F <<1 & 0x3F 31 Control instruction

49 49 Kurt Keutzer Designs All designs had a BER of after 10 million iterations Design MHz, 48 mW, 1K DCache, 1K ICache, TIE Design MHz, 144 mW, 1K DCache, 1K ICache, TIE Design MHz, 69 mW, 16K DCache, 16K ICache, TIE Design MHz, 191 mW, 16K DCache, 16K ICache, TIE Design MHz, 191 mW, 16K DCAche, 16K ICache, TIE with state

50 50 Kurt Keutzer Performance Kb/s

51 51 Kurt Keutzer Energy Dissipation uJ/bit

52 52 Kurt Keutzer n(s*J)/Bit n(s*J)/Bit

53 53 Kurt Keutzer Die Area mm 2

54 54 Kurt Keutzer Summary: Levels of Configurabilty Configurability is highly desirable in embedded applications There are many levels of configuration: Memory and Cache Sizes - get precisely the sizes your applications needs Register file sizes Interrupt handling and addresses Peripherals Instructions Physical implementation


Download ppt "1 Kurt Keutzer Lecture 11: Interfaces, I/O and Configurable Processors Professor Kurt Keutzer Computer Science 252 Spring 2000 With contributions from."

Similar presentations


Ads by Google