Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Similar presentations

Presentation on theme: "Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation."— Presentation transcript:

1 Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation

2 Objectives Discuss ASIC, FPGA-based systems, and general purpose processors Analyze the operating requirements for today’s embedded processors Observe the architectural differences between state-of-the-art processors for embedded systems and high-performance general purpose processors –Tensilica Xtensa –Stretch S5000

3 Embedded Processors Requirements operate in memory constraint environment must be energy efficient must be low cost may have to be good at a common set of tasks –matrix multiplication, –encryption, –filtering (FIR), –network packet processing, etc.

4 Implications low memory footprint –simplified instruction set 16-bit, 24-bit –may not need support for VM may lack hardware MMUs energy efficient –less complex (smaller number of transistors) –simple pipeline stages –less cache memory on chips –simple floating point units –larger transistors and slower clocks –integrated function specific components for common tasks

5 Implications (cont.) low cost –share IP cores to reduce development cost ARM, MIPS, etc. –use older semiconductor process technologies (e.g. 250nm instead of 90 nm) task specific –built in DSP unit –wide data bus (more data per movement) –may need support for adding functions to the cores –may need field-reconfigurability

6 Rationales from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming,

7 Rationales (cont.) from “The Death of Micro-Processors”, Nick Tredennick and Brion Shimamoto, Embedded Systems Programming,

8 Rationales (cont.) “Studies have shown that custom hardware components often require much less energy to complete their tasks than the same tasks running on general purpose processors.” [1] “An ASIC is custom logic for a particular application. Custom logic can be orders of magnitude more efficient than microprocessor- based solutions.” [2] [1] Lach et al., “Power-Efficient Adaptable Wireless Sensor Networks”, Proceedings of International Conference on Military and Aerospace Programmable Logic Devices (MAPLD), September 2003. [2] Tredennick and Shimamoto, “The Death of Micro-Processors”, Embedded Systems Programming,

9 Application Specific ICs (ASICs) provide custom design solutions for particular problems –fixed solutions that require public acceptance to reduce cost –required extensive knowledge of hardware design –not field-reconfigurable –can have large non-recurring engineering (NRE) cost

10 ASICs (cont.) TechnologyMask cost 90 nm$1,000,000 180 nm$250,000 250 nm$120,000 350 nm$60,000 Wayne Wolf, FPGA-Based System Designs, Prentice Hall, 2004

11 FPGA Based Systems Field-programmable gate arrays (FPGAs) –are slower and require more power than custom design –are more expensive –but provide no wait time from completing a design to making a chip great for prototyping –are also reusable

12 FPGAs SRAM based--volatile –Altera Flex, Stratix, Cyclone, Apex Antifuse--one-time programmable –Actel EEPROM--non-volatile –Altera Max

13 ASIC Design Approaches Custom VLSI designs –are fabricated on manufacturing line takes months masking cost is also expensive –operate much faster and consume less power than FPGA equivalents –can be cheaper of manufactured in large volume

14 ASIC Design Approaches (cont.) Structured ASIC –is based on pre-designed logic fabric structurally embedded in the platform –fill the market gap between high-density FPGAs and standard cell ASICs can greatly reduce development time and cost reduce non-recurring engineering (NRE) cost

15 Structured ASICs View Altera demo

16 Integrating ASICs with GPPs Today’s embedded systems have can have complex software layers –OS –Virtual Machine –Applications It is more ideal to mate GPPs with ASICs as co-processors

17 Integrating ASICs with GPPs (cont.) So, we can have GPPs to perform basic tasks and ASICs (co-processors) to speed up computing intensive functions –sounds simple but in reality, it is quite complex –basic hand-shaking is needed between the ASICs and the main processors data exchange –shared memory –requires OS and architecture support synchronous or asynchronous calls cache coherency issue

18 ASICs and GPPs (cont.) An example is to use hardware co- processor for Cryptography –should the co-processor calls be synchronous main processor blocked on calls and wait for response –or asynchronous calling process blocked and swapped out need interrupt support need to maintain context

19 ASICs and GPPs (cont.) Co-processor –shares bus with the main CPU is a source for bus contention –can cause cache coherency issue data in the main CPU cache may have been updated by the co-processor –flush the cache accordingly –should be equiped with DMA to relieve the main CPU from copying data

20 Extending GPPs Tensilica Xtensa –reconfigurable processor cores support native 16-bit and 24-bit instruction for higher code density users can add/subtract components (MMU, Multipliers, FPUs) users can reconfigure cache organization users can select bus width (32, 64, or 128 bits) –users defined instruction extension language users can create custom instructions to speed up commonly used functions users can instantiate custom registers of different sizes

21 Tensilica Xtensa from

22 Tensilica Xtensa (cont.) We will not go into great detail about the Xtensa. However, we will study Stretch S5000 engine which is based on the Xtensa core.

23 Design Time Solutions Up to now, we have only talked about design- time solutions! –logic designs are done in house –not very reconfigurable after the chip is made –even with FPGAs, someone has to come up with a new hardware design for it to change –the Xtensa needs about 1 hours to synthesize the instruction extension What if we want to configure on the fly! –each application brings in CPU intensive functions these functions are not known in advance –Can we leave it up to the software developers to design fast co-processor?

24 Run-Time Configuration

25 (R)evolution of Processors Rock Hard Ice Hard Playdough Hard

26 (R)evolution of Processors Rock Hard Ice Hard Playdough Hard Hardwire, GPP Perform well in most conditions but not extreme conditions

27 (R)evolution of Processors Rock Hard Ice Hard Play Dough Hard GPP with FPGAs Custom designs perform well in some extreme conditions. Required extensive knowledge Of hardware design

28 (R)evolution of Processors Rock Hard Ice Hard Playdough Hard GPP with embedded programmable logics Reconfiguration triggered by software

29 (R)evolution of Processors Ice Hard –Contains ASIC (Application Specific IC) designs Increases time-to- market Takes time to reconfigure

30 Software Hotspots In DSP –80% of the processing load are spent on 20% of the code Hand tuned assembly that can take thousands of cycle to execute. Less portable –The remaining 80% of the code have complex system functions Run well on most GPP

31 Software Hotspots Example when 16 QuadAM modem (19.2 Kbaud) implemented entirely in software –takes 177,000 instruction cycles to execute on TIC6711 FPGA Co-processor (a few cycles)


33 Solving Hotspots PERFORMANCE ASIC CPU SCP FLEXIBILITY & TTM FPGA DSP SCP = Software Configurable Processor

Download ppt "Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation."

Similar presentations

Ads by Google