Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 130L: Optimizing your SH2A Application Kevin P King Senior.

Slides:



Advertisements
Similar presentations
High-performance Cortex™-M4 MCU
Advertisements

Computer System Overview
ID 311C:Utilizing JTAG / boundary scan and JTAG emulation for board and system level test and design verification Get the total Coverage ! GOEPEL Electronics.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Using a Renesas Code Generation Tool for RL78 Devices.
ID 020C: Hardware-in-Loop: System Testing Without the System Marcella Haghgooie Sr. Field Applications Engineer Version: 1.2 Applied Dynamics International.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: Migrating from 8-to-32 bit Processors CC17I Kevin.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. A13C: Performing Digital Filtering on an MCU Kevin P King.
ID 413C: Can Touch This: Designing Capacitive-Based Touch Solutions Mark F Rodriguez Senior Engineering 13 October 2010 Version: 1.0 Xaplos Inc.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID B32L: Graphical Application Development under Linux/Implementing.
Renesas Electronics America Inc. ID 311L: PIC to R8C Converter David Hedley Staff AE, Applications Engineering 14 Oct 2010 Version: 1.1.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Increase the Dynamic Range and Precision of Digital Filters.
422L:Introduction to the.NET Micro Framework Julie Trygstad Vice President and Principal Engineer Version: 1.1 TrygTech 13 October 2010.
Renesas Electronics America Inc. ID 130C: Increasing Application Performance and Data Throughput with SH-2A MCUs Dean Chang Product Marketing Manager 12.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Know your Precise Position with RX600 MCU.
Renesas Electronics America Inc. “© 2010 Renesas Electronics America Inc. All rights reserved.” ID 720L: Software Development with an Open Source Real-Time.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID A15C: Application Code Reprogramming Using Different Serial.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 320L: Rapid RX600 System Development Using the RPDL and.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: 3L13B David Hedley, Applications Engineer Advanced.
Renesas Electronics America Inc. “© 2010 Renesas Electronics America Inc. All rights reserved ID 220L: Hands-on Embedded Ethernet Design with an Open Source.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. 431L: Using a Graphics API to Create User Interface Components—Advanced.
An Introduction to OSEK l JRD l ETAS-STV/PRM-E l 2010 © ETAS GmbH All rights reserved. The names and designations used in this document are trademarks.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 115C: Low Pin Count V850: Small but Powerful MCU for portable.
Renesas Electronics America Inc. ID 322C:Using HEW's Many Capabilities to Boost Software Development Productivity Axel Wolf Marketing Manager, Development.
© 2010 Renesas Electronics America Inc. All rights reserved. 131L: Optimizing RX Performance John Breitenbach President, Atlantex Corp. 14 October 2010.
© 2009, Renesas Technology America, Inc., All Rights Reserved 1 Course Introduction  Purpose:  This course provides an overview of the SH-2 32-bit RISC.
ID 310C:Run-Time Visualization on Renesas MCUs Matt Gordon Sr. Applications Engineer Version: 1.2 Micriµm 12 October 2010.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Migrating from CubeSuite+ to Eclipse.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID A12C:Noise Fundamentals and Techniques for Minimizing.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 930L: Board ID Embedded Security Lab Shotaro Saito Application.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 410L: Low cost audio in 8/16 bit applications using ADPCM.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. A11L: 78K0R Low Power MCU Hands-On Lab Bob Proctor Staff.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. A30L: Increase the Dynamic Range and Precision of Digital.
ID A16C: Outfitting Embedded Devices with Low Power Wireless Communications Design considerations for adding wireless communications to low power embedded.
00000-A Rev a Renesas Electronics Corporation ©2010. Renesas Electronics Corporation, All rights reserved. 2015/10/19 V850 Architecture Overview.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: Know your Precise Position with RX600 MCU Huangsheng.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. 113C: Migration to the RX600 Made Easy Life in the fast lane.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID630L: Becoming Familiar with Sensorless Vector Control.
Class ID: Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: Using Virtual EEPROM and Flash API for.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 011C: VELOCITY LAB TM Embedded Development Ecosystem Amrit.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: Increase the Dynamic Range and Precision of Digital.
IAR Systems, Inc. ID 323C:A High Performance Compiler Solution for the RX Platform Shawn A. Prestridge Senior Field Applications Engineer 12 October 2010.
Class ID: Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Implementing Bootloaders on Renesas MCUs.
2L01I Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: 2L02I CAN In A Day Carl Stenquist, Staff.
Moving Arrays -- 1 Completion of ideas needed for a general and complete program Final concepts needed for Final Review for Final – Loop efficiency.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 434L: Incorporating a Capacitive Touch Interface into.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Advanced Debugging on the RX600.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Advanced E 2 Studio Topics.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 411L:A Direct Drive LCD Software Solution for Driving.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: Low Power Design Michael Thomas, Applications Engineer.
ID 222L: Get Connected with USB on RX62N
ID 024C: Auto Code Generation: The Shortest Distance From Idea to Implementation Christopher Myers Director of Software Development 12 October 2010 Version:
Global Edge Ian Carvalho Architect 14 October 2010 Copyright © 2010, Global Edge Software Ltd., Bangalore, India Version 1.0 ID 730L: Getting Started with.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 230L: Simplify your Networked Application with CAN and.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 416C:SH-2A Graphics for Low- to Mid-level Graphics Applications.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: 3L05I Advanced Debugging on the RX600 Fatih Peksenar.
Renesas Electronics America Inc. © 2012 Renesas Electronics America Inc. All rights reserved. Class ID: 5L08I Using the Renesas Graphics API to Create.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 322L:Advanced Debugging on the RX600 Brandon Hussey Applications.
IAR Systems, Inc. ID 324C:Introducing the Embedded Workbench for the Renesas SH Shawn A. Prestridge Senior Field Applications Engineer 12 October 2010.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 421L: R8C Segment-LCD API Lab Bob Proctor Staff Engineer.
Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID A10L:A Flexible Flash Rewrite Application Brandon Hussey.
ID 021L: Model Based Control Design and Auto-Code Generation using the R8C Christopher Myers Director of Software Development 12 October 2010 Version:
GCSE Computing - The CPU
David Hedley Staff AE, Applications Engineering 12 Oct 2010
Visit for more Learning Resources
Pipelining: Advanced ILP
David Hedley Staff AE, Applications Engineering 13 October 2010
ID 325L: Getting Started with CubeSuite
GCSE Computing - The CPU
Lecture 5: Pipeline Wrap-up, Static ILP
Presentation transcript:

Renesas Electronics America Inc. © 2010 Renesas Electronics America Inc. All rights reserved. ID 130L: Optimizing your SH2A Application Kevin P King Senior Staff Applications Engineer 14 October 2010 Version 1.2

2 © 2010 Renesas Electronics America Inc. All rights reserved. Kevin P King Education Electrical Engineering, University of Lowell (Edward B Van Dusen Award for Academic Achievement) Thirty years of Embedded Design Experience (x86, HC05, HC11, 8051, Philips XA, Atmel AVR, Hitachi, Mitsubishi, etc.... Five years of Emulator design for MetaLink COP8, 68HC05, 68HC11, 8051 (multi-vendors), National CR16, Hitachi H8/500, etc... Multiple Quality Awards for Embedded Software & Hardware Development. Specialty is Embedded System Design - MCU firmware & hardware Senior Staff Application Engineer Primary Tech Support for SH2A Focusing on Medical Segment and SH Family

3 © 2010 Renesas Electronics America Inc. All rights reserved. Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** Solutions for Innovation ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).

4 © 2010 Renesas Electronics America Inc. All rights reserved. 4 Renesas Technology and Solution Portfolio Microcontrollers & Microprocessors #1 Market share worldwide * Analog and Power Devices #1 Market share in low-voltage MOSFET** ASIC, ASSP & Memory Advanced and proven technologies * MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010 **Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis). Solutions for Innovation

5 © 2010 Renesas Electronics America Inc. All rights reserved. 5 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia  Up to 1200 DMIPS, 45, 65 & 90nm process  Video and audio processing on Linux  Server, Industrial & Automotive  Up to 500 DMIPS, 150 & 90nm process  600uA/MHz, 1.5 uA standby  Medical, Automotive & Industrial  Legacy Cores  Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security  Up to 10 DMIPS, 130nm process  350 uA/MHz, 1uA standby  Capacitive touch  Up to 25 DMIPS, 150nm process  190 uA/MHz, 0.3uA standby  Application-specific integration  Up to 25 DMIPS, 180, 90nm process  1mA/MHz, 100uA standby  Crypto engine, Hardware security  Up to 165 DMIPS, 90nm process  500uA/MHz, 2.5 uA standby  Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose

6 © 2010 Renesas Electronics America Inc. All rights reserved. 6 Microcontroller and Microprocessor Line-up Superscalar, MMU, Multimedia  Up to 1200 DMIPS, 45, 65 & 90nm process  Video and audio processing on Linux  Server, Industrial & Automotive  Up to 500 DMIPS, 150 & 90nm process  600uA/MHz, 1.5 uA standby  Medical, Automotive & Industrial  Legacy Cores  Next-generation migration to RX High Performance CPU, FPU, DSC Embedded Security  Up to 10 DMIPS, 130nm process  350 uA/MHz, 1uA standby  Capacitive touch  Up to 25 DMIPS, 150nm process  190 uA/MHz, 0.3uA standby  Application-specific integration  Up to 25 DMIPS, 180, 90nm process  1mA/MHz, 100uA standby  Crypto engine, Hardware security  Up to 165 DMIPS, 90nm process  500uA/MHz, 2.5 uA standby  Ethernet, CAN, USB, Motor Control, TFT Display High Performance CPU, Low Power Ultra Low Power General Purpose SuperH

7 © 2010 Renesas Electronics America Inc. All rights reserved. Innovation Engine Control Unit What used to take multi-MCU or MCU + DSP can now be done by a single MCU!

8 © 2010 Renesas Electronics America Inc. All rights reserved. Position Renesas provides the tools that allow you to use the Superscalar Architecture to realize System performance that in the past required dual processor designs containing both an MCU and a DSP. nop;

9 © 2010 Renesas Electronics America Inc. All rights reserved. Agenda Short Architecture review Default Optimization choices (what do they do) Delayed Branching and Delay Slot usage FPU code controls In-lining code Misc TBR usage Section Control Inline assembly code (Optional)

© 2010 Renesas Electronics America Inc. All rights reserved. 10 SH2/SH2A Architecture* *SH Core training is available on RenesasInteractive

11 © 2010 Renesas Electronics America Inc. All rights reserved. Super Scalar versus Dual Core Scalar – One Thread/One Instruction at a time Single Instruction Stream/Single Pipeline – Fetch, Decode Execute Super Scalar – One Thread / multiple instructions at a time For SH2A - 2 FETCH, 2 DECODE, 2 EXECUTE Dual Core – 4 instructions at a time / 2 independent threads

12 © 2010 Renesas Electronics America Inc. All rights reserved. SH-2A Features: Superscalar Pipeline / Floating Point Unit 5 Stages SH-2A-FPU CPU Core only CPU FPU Pipeline Superscalar

13 © 2010 Renesas Electronics America Inc. All rights reserved. Register Set – SH2 SH2A

14 © 2010 Renesas Electronics America Inc. All rights reserved. SH-2A Register Banks Regbank settings Disabled All Ints Banked Banked by Priority Two new interrupts Bank Overflow Bank Underflow New HEW Window CPU  Register Banks

15 © 2010 Renesas Electronics America Inc. All rights reserved. SH-2A Register Banks Regbank settings Disabled All Ints Banked Banked by Priority Two new interrupts Bank Overflow Bank Underflow New HEW Window CPU  Register Banks

16 © 2010 Renesas Electronics America Inc. All rights reserved. SH-2A Fast Interrupt Response CPU Latency Save Context (By Complier) User Code Restore Context Typical MCUs INT Trigger Latency 9 cycles SH-2A MCU 9 Cycles CPU Latency + Save Context User Code Restore Context 15 Reg. Banks LIFO HW saves the context in register bank LIFO One Primary Reg. Bank +

17 © 2010 Renesas Electronics America Inc. All rights reserved. QUESTION? Register banking simplifies/speeds my ISR context switch when using the FPU? (be careful with your answer) Yes, the register banking always helps you context switch, however FPU registers are not banked an thus must be saved on the stack if they are used in the ISR.

© 2010 Renesas Electronics America Inc. All rights reserved. 18 FPU review* * Full SH2A-FPU training available at RenesasInteractive

19 © 2010 Renesas Electronics America Inc. All rights reserved. FPU Registers Load/Store Integer through the FPUL register 16 Single Precision Registers FPR0-FPR15 8 Double precision DR0-DR14 (use even numbers) Created by concatenating 2 FP registers Configured in Software by MCU FPSCR.SZ controls transfer size FPSCR.PR controls precision FPUL FPSCR 31 0

20 © 2010 Renesas Electronics America Inc. All rights reserved. Pop Quiz: The SH2A-FPU core can handle (Choose BEST answer): a)Single precision b)Double precision c)Both single and double d)Both single and double, but requires run time configuration changes if using mixed precisions in your code e)None of the above d – The FPU can handle both, but it must switch between modes if your code contains “mixed” arithmetic. We will examine this in the lab so you can get optimal code when doing floating point.

21 © 2010 Renesas Electronics America Inc. All rights reserved. QUESTION: The SH2A-FPU core is a load store architecture. In order to get information into and out of the Floating Point registers you must go through the FPUL (floating point communications registers). TRUE FALSE FALSE – You only need the FPUL to communicate from the Integer registers to the Floating point register, thus you are only “penalized” you if you do a lot of Integer  Float. Floating point data may be moved directly between memory and FPU registers.

22 © 2010 Renesas Electronics America Inc. All rights reserved. SH2A Bus Structure SH-2A CPU (Superscalar) On-chip RAM F bus (instruction) M bus (data) 32bit/1cyc Cache controller I bus (internal bus) 32bit/1cyc DMAC/DTC Bus State Controller External bus Bridge P bus (peripheral bus) TimersADCSCIPORT 32bit/1cyc 16bit/3cyc Cache memory Instruction/Data cache: 8KB/8KB 4way set associative (LRU) On-chip Flash SDRAM, SRAM, etc... I/F FPU Harvard Architecture

23 © 2010 Renesas Electronics America Inc. All rights reserved. Example: SH2A SRAM Connection Details Multiple connections to I, M and F Bus Independent read/write ports Priority I, M then F (in case of multiple access to same page*) * for example DMAC + CPU

24 © 2010 Renesas Electronics America Inc. All rights reserved. Multi-page RAM Access Conflict when accessing same page No conflict when different pages RAM

25 © 2010 Renesas Electronics America Inc. All rights reserved. Questions before we start the lab?

26 © 2010 Renesas Electronics America Inc. All rights reserved. Start the Lab Keep your dice turned to the section of the lab you are on. (Instructions are provided in the lab handout) Please refer to the Lab Handout and let’s get started!

27 © 2010 Renesas Electronics America Inc. All rights reserved. Checking Progress We are using the die to keep track of where everyone is in the lab. Make sure to update it as you change sections. When done with the lab, your die will have the 6 pointing up as shown here.

28 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 1: 1.1 No, Debug is slowest, this surprises most people 1.2 Debug setting does NOT use Delay Slots. This allows for “sequential code execution” for easy debug. 1.3 Speed uses the delay slot after the branch to cut the loop iteration in half. RULE #1: Let the compiler do its Job! SH2A gets maximum performance when the compiler is allowed to re- order code to avoid pipeline stalls and use the delay slot, which might normally be a wasted fetch and decode.

29 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 1: No optimization Delay slot used But loop count still 4

30 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 1: Speed Delay Slot used And FMAC duplicated to Cut loop count in half

31 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 2: 2.1 NO, has some extra code 120 result = factorial((unsigned int)3); A4 D7A A6 2F76 MOV.L A8 086A STS FPSCR,R AA E403 MOV #H'03,R AC 28E9 AND R14,R AE B17D B0 486A LDS R8,FPSCR B2 7F04 ADD #H'04,R15 Must change mode of FPU because you used Mixed

32 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 2: 2.2 Looks more like what you would expect 2.3 You should see about 40nS savings in performance in “safe mode” when doing function calls RULE #2: Decide your math requirements up front! If possible choose Single or Double precision. If Not, use safe mode and take the minimal hit when you do need to do Double precision.

33 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 2: %, 2 byte, half, single instruction call 2.5 No, because they were already single instruction calls 95 HardwareSetup(); // Use Hardware Setup JSR/N Using TBR 95 HardwareSetup(); // Use Hardware Setup D70C B Not Using TBR Seem small, but think about how many calls in your code!

34 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 3: Performance should get progressively better Number of registers saved decreases with each “optimization” Inlining the code rather than function calls, the emulator can see the registers is needs to use and thus save. RULE #3: Reduce your interrupt overhead Use regbanking where possible. Use inline code in you ISR to save the pushing/popping of FPU register set.

35 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 4: Nothing in source indicates in-lined code

36 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 4: 4.2 You go to the function, but it is really at the same PC range where you came from in main 4.3 Just show_simple. We basically told it that start_timer could not be inlined See HINT. We still had show_simple and show_addressing inlined. Be careful when you selected detailed optimization it does not change with the “global settings” 4.6 Yes RULE #4: In-lining can be used to keep you code “pretty” while making it run faster by reducing function calls. Control the Implicit and Explicit inlineing to take advantage of tiny speed improvements. You code still looks logically like you intended.

37 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 5: 5.1 You should have seen about 100nS of savings. RULE #5: When possible take advantage of “free time”. By understanding where your high-frequency access to buffers may be, simple control of their location give you performance enhancements without changing code functionality at all.

38 © 2010 Renesas Electronics America Inc. All rights reserved. Questions Section 6: 6.1 Sort of tongue in cheek, lots of errors of course. 6.2 On main_variable. RULE #6: When using assembler code, be aware of the variables you may want to “watch”. Some type information is lost by generating.src file and then obj. This is probably even worse for complex structures.

39 © 2010 Renesas Electronics America Inc. All rights reserved. Lab Summary Review Default Optimization choices Delayed Branching and Delay Slot usage FPU code controls Misc TBR usage Section Control In-lining code In-line assembly code Rev. 1.00DateMeeting Title * This will be repeated many times in the lab

40 © 2010 Renesas Electronics America Inc. All rights reserved. Innovation Engine Control Unit What used to take multi-MCU or MCU + DSP can now be done by a single MCU!

© 2010 Renesas Electronics America Inc. All rights reserved. 41 Thank You!

© 2010 Renesas Electronics America Inc. All rights reserved. 42 Appendix: Additional Information

43 © 2010 Renesas Electronics America Inc. All rights reserved. FPU Load/Store operation float show_simple(int x, int y) { return(x*y); } MOV R5,R0 MULR R0,R4 LDS R4,FPUL RTS FLOAT FPUL,FR0 Multiply passed parameters Move result to FPU communications register Convert to float

44 © 2010 Renesas Electronics America Inc. All rights reserved. FPU Load/Store operation - Float FPUL FPSCR 31 0 MOV R5,R0 MULR R0,R4 LDS R4,FPUL RTS FLOAT FPUL,FR0 0x10 0x e03

45 © 2010 Renesas Electronics America Inc. All rights reserved. FPU Load/Store operation - Double FPUL FPSCR x10 0x e03

Renesas Electronics America Inc.