Presentation is loading. Please wait.

Presentation is loading. Please wait.

Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.

Similar presentations


Presentation on theme: "Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units."— Presentation transcript:

1

2 Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units

3 Mobile Computing Design Considerations Low power Real-time data processing Small size Low cost Quick time to market

4 Metric Introduction Processor specialization Instruction set Interconnect Memory specialization Functional & Data path units Power Specialization

5 Metric: Processor Specialization Central controlling point of embedded system Examples: –VLIW to perform multiple instructions in parallel. –RISC architecture

6 Metric: Instruction Set Specialization Introduction of new instructions to extract optimal performance from the processor Examples: –Multiply-accumulate –Vector operations

7 Metric: Interconnect Provides means for different modules to communicate Optimizations can lead to reduced complexity, cost, and power consumption

8 Metric: Memory Specialization Specialization is achieved through optimization of number and size of memory banks, number and size of access ports Optimizations can improve performance, power consumption, and chip area

9 Metric: Functional & Data Path Units Functional units are often specialized hardware units implementing a frequently used software algorithm Examples: –DSP co-processors, interrupt priority co- processors, memory access modules, and timer modules

10 Metric: Power Specialization Major concern in mobile systems Kept under control by: –Using low voltage –Slow clock speed –Custom circuit solutions

11 Architectures to be discussed M*CORE D30V/MPEG SuperENC 1.3-GOPS Parallel DSP IA-32 w/ Enhanced Data Streaming

12 M*CORE Low power embedded applications Wireless mobile devices Cellular phones

13 M*CORE Processor Specialization Simple RISC architecture 4 stage pipeline 16-bit instruction word length Compiler designed in parallel with architecture Barrel shifter built into ALU

14 M*CORE Instruction Set Specialization Multimedia instructions –Multiple data transfers from memory to register and register to memory. –Fast register saves FF1 – Find First 1 –Finding highest priority interrupt in hardware

15 M*CORE Interconnect Specialization 16 – bit data bus to match 16 bit word length –Reduces memory bandwidth, complexity, chip area layout, and power consumption MDI – MCU–to-DSP Interface –Dual access memory messaging unit General I/O bus for a peripherals

16 M*CORE Memory Specialization Alternate register bank –Fast register saves for context switches

17 M*CORE Functional & Data Path Units 32 channel programmable interrupt controller Protocol timer DSP core

18 M*CORE Power Specialization 1.8 Volts Uses 0.5 Watts Power aware pipeline Programmable power states –Stop –Wait –Dose –Normal

19 M*CORE Summary Low power and programmable power states make it ideal for mobile devices Interface to built in DSP core makes it ideal for cell phone applications

20 650 MHZ IA-32 Microprocessor designed to accelerate data- streaming applications Three-dimensional graphics Video encode/decode

21 650 MHZ IA-32 Processor Specialization IA-32 architecture 70 new instructions SIMD floating point data type Improvements in regard to circuit implementation

22 650 MHZ IA-32 Instruction Set Specialization 70 new instructions –SIMD FP operations –Control for new 8-entry register file –Multimedia extension 12 new integer instructions

23 650 MHZ IA-32 Interconnect Specialization Front Side Bus of 66, 100, 133 MHz Back Side Bus –Half the clock frequency for mobile and desktop applications –Full clock frequency for server/workstation applications

24 650 MHZ IA-32 Memory Specialization 3 new non-temporal store instructions with write combining buffers –Burst write protocol –Write data throughput of 1.066 Gbytes/sec on a 133 MHz bus 4 new data pre-fetch instructions –Overlap, reduces cache miss penalties

25 650 MHZ IA-32 Functional Specialization 8 entry register file –Reduces register starvation for SIMD unit –128 bits wide four independent single precision elements packed in parallel Dedicated table based lookup unit for reciprocal operations –Completes reciprocal operations in one clock cycle –Error of 1.5 * 2^-12

26 650 MHZ IA-32 Low Power Usage 1.4 V ~ 2.2 V at 650 MHz close to room temperature

27 650 MHZ IA-32 Performance 1.5X to 2.0X performance boost for 3-D transform and lighting kernels Real-time MPEG-2 video/audio encoding at 30 frames per second –Achieved through improvement to SIMD unit, at a cost of only 2% increase of unit area size

28 D30V/MPEG Multimedia applications –Decoding MPEG-2

29 D30V/MPEG Processor Specialization 2 way VLIW Dual issue RISC pipeline 2 way assigned SIMD module Pipeline has ability to re-route data through execution path

30 D30V/MPEG Instruction Set Specialization Saturate and Add DSP instructions built in –Modular addressing –Block repeat –Multiply accumulate Half word instructions –Effectively double number of useable registers

31 D30V/MPEG Interconnect Specialization Chip layout specialized for decoding streaming mpeg data

32 D30V/MPEG Memory Specialization 32 Kbyte data RAM 64 Kbyte instruction RAM 4 Kbyte RAM for Variable Length Encoder/Decoder (VLC/VLD) tables Special Registers –MOD_S & MOD_E for modulo addressing –RPT_S, RPT_E, and RPT_C for looping

33 D30V/MPEG Functional Specialization VLC/VLD Variable Length Encoding/Decoding units

34 D30V/MPEG Low Power Usage 2.5 Volts at 243 MHz Uses 2.0 Watts

35 D30V/MPEG Performance 12 % speedup from inter-pipe bypasses Special VLC/VLD functional blocks speedup MPEG decoding

36 1.3 GOPS Parallel DSP Achieve real-time image processing capability Employ data parallelism to achieve goal –High level algorithms, non-parallelizable Arithmetic encoding –Medium level algorithms, medium parallelizable Contour tracking of binary images –Low level algorithms, high parallelizable Filters and transforms Data independent control and data flow 80 % of MPEG-2, 60% of MPEG-4

37 1.3 GOPS Parallel DSP Processor Specialization Central control unit –RISC based –Controls multiple SIMD units

38 1.3 GOPS Parallel DSP Instruction Set Specialization VLIW instructions –3 instructions per issue 1 load/store 16 bit data 2 arithmetic operations on 16/32 bit data

39 1.3 GOPS Parallel DSP Interconnect Specialization DMA/MCU (Direct Memory Access/Memory Control Unit) –Handles cache misses –Performs prefetch operations from matrix memory –Interfaces with external 64 bit data bus and 32 bit address bus for SRAM and DRAM modules

40 1.3 GOPS Parallel DSP Memory Specialization Memory tailored to image processing needs –Provides parallel high bandwidth access to shared data with matrix shaped access patterns Individual Cache Memory –Services irregular memory requests

41 1.3 GOPS Parallel DSP Functional Specialization Multiple SIMD units –Currently 4 units for prototype –16 units planned for future versions –SIMD approach has been extended with ASIMD, autonomous instruction selection capability Improves handling of conditional branches

42 1.3 GOPS Parallel DSP Low Power Usage 3.3 Volts Using 650 milliwatts

43 1.3 GOPS Summary Sustained performance 380 MIPS –Around 90% utilization

44 SuperENC MPEG-2 video encoder

45 SuperENC Processor Specialization Software implemented RISC architecture –5 stage pipeline –81 MHz, 32 bit wide data/instruction path Software implemented SIMD/SDIF (SDRAM Interface) modules

46 SuperENC Instruction Set Specialization There is no instruction set specialization mentioned in the paper.

47 SuperENC Interconnect Specialization SDIF –All memory access goes through SDIF –Relay data without going to external memory Reduces memory bandwidth and power consumption

48 SuperENC Memory Specialization Uses external RAM –Can access two 16 Mbit SDRAMS or one 64 Mbit SDRAM

49 SuperENC Functional Specialization MPEG algorithm is broken up into hardware functional blocks –Example DCT, Discrete Cosine Transfer IDCT, Inverse Discrete Cosine Transfer ME. Motion Estimation MC, Motion Compensation

50 SuperENC Low Power Usage 2.5 Volts internal 3.3 Volts I/O 1.5 Watts

51 SuperENC Summary SuperENC makes use of many hardware functional blocks to implement the MPEG decoding algorithm

52 Metric Results D30V/MPEG highest rated


Download ppt "Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units."

Similar presentations


Ads by Google