OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for.

Slides:



Advertisements
Similar presentations
Slide: 1 CAMP 06: Maturing Minds Programming in C Recap Camp 06 Maturing Minds.
Advertisements

Chapter 9 – One-Dimensional Numeric Arrays. Array u Data structure u Grouping of like-type data u Indicated with brackets containing positive integer.
Manipulating Bit Fields in C Noah Mendelsohn Tufts University Web: COMP 40: Machine.
Slides created by: Professor Ian G. Harris Efficient C Code  Your C program is not exactly what is executed  Machine code is specific to each ucontroller.
Lecture 6 Programming the TMS320C6x Family of DSPs.
Making Choices in C if/else statement logical operators break and continue statements switch statement the conditional operator.
Primitive Data Types There are a number of common objects we encounter and are treated specially by almost any programming language These are called basic.
Data Types in Java Data is the information that a program has to work with. Data is of different types. The type of a piece of data tells Java what can.
Variables: Named Storage Locations Variables must be defined or declared before they can be used so that appropriate memory storage can be allocated for.
COMP3221 lec08-arith.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 8: C/Assembler Data Processing
Bellevue University CIS 205: Introduction to Programming Using C++ Lecture 3: Primitive Data Types.
More on Recursive Recursion vs. Iteration Why Recursion?
CS1061 C Programming Lecture 4: Indentifiers and Integers A.O’Riordan, 2004.
Railway Foundation Electronic, Electrical and Processor Engineering.
Railway Foundation Electronic, Electrical and Processor Engineering.
Unsigned and Signed Numbers. Hexadecimal Number 217A 16 Position Digits A Value = 2x x x16 + Ax1 = 2x x x16.
Development. Development Environment Editor Assembler or compiler Embedded emulator/debugger IAR Embedded Workbench Kickstart Code Composer Essentials.
© Janice Regan, CMPT 128, Jan CMPT 128: Introduction to Computing Science for Engineering Students Integer Data representation Addition and Multiplication.
1/2002JNM1 Basic Elements of Assembly Language Integer Constants –If no radix is given, the integer is assumed to be decimal. Int 21h  Int 21 –A hexadecimal.
Java Software Solutions Lewis and Loftus Chapter 5 1 Copyright 1997 by John Lewis and William Loftus. All rights reserved. More Programming Constructs.
Operators in Python. Arithmetic operators Some operators in Python will look familiar (+, -, *, /) Others are new to you (%, //, **) All of these do work.
1 2-Hardware Design Basics of Embedded Processors (cont.)
Speed-up of the ring recognition algorithm Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia.
New Tools And Workshop Mod & For Loops. Modulo Calculates the remainder (remember long division?) % Examples: 7 % 3 10 % 2 2 % 3 evaluates to 1 evaluates.
Bitwise Operators Fall 2008 Dr. David A. Gaitros
These notes were originally developed for CpSc 210 (C version) by Dr. Mike Westall in the Department of Computer Science at Clemson.
Hierarchy of C++ data types. Hierarchy of C++ data types Contd.. Note : The modifies signed, unsigned, long and short may be applied to character and.
Bit Fields & Bitwise Operations CS-2303, C-Term Bit Fields & Bitwise Operations CS-2303 System Programming Concepts (Slides include materials from.
CS1372: HELPING TO PUT THE COMPUTING IN ECE CS1372 Some Basics.
School of Computer Science & Information Technology G6DICP - Lecture 4 Variables, data types & decision making.
Arithmetic Expressions Addition (+) Subtraction (-) Multiplication (*) Division (/) –Integer –Real Number Mod Operator (%) Same as regular Depends on the.
1 Lecture 5 More Programming Constructs Instructors: Fu-Chiung Cheng ( 鄭福炯 ) Associate Professor Computer Science & Engineering Tatung Institute of Technology.
The Instruction Set Architecture. Hardware – Software boundary Java Program C Program Ada Program Compiler Instruction Set Architecture Microcode Hardware.
Free Ebooks Download Mba Ebooks By Edhole Mba ebooks Free ebooks download
Java Basics. Tokens: 1.Keywords int test12 = 10, i; int TEst12 = 20; Int keyword is used to declare integer variables All Key words are lower case java.
© Janice Regan, CMPT 128, Jan CMPT 128: Introduction to Computing Science for Engineering Students, continue; and break; statements.
1 Identifiers: Names of variables, functions, classes (all user defined objects), Examples: a b gcd GCD A COSC1373 TAX Tax_Rate Tax Rate if else while.
CHAPTER 4 CS 3370 – C++ Expressions. Operators Unary -, *, ++, -- higher precedence than binary operators most associate right-to-left Binary most associate.
Embedded Systems Programming Writing Optimised C code for ARM.
Integer VariablestMyn1 Integer Variables It must be possible to store data items in a program, and this facility is provided by variables. A variable is.
Optimization. How to Optimize Code Conventional Wisdom: 1.Don't do it 2.(For experts only) Don't do it yet.
Chapter 2 Variables and Constants. Objectives Explain the different integer variable types used in C++. Declare, name, and initialize variables. Use character.
Computers’ Basic Organization
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Expressions.
Manipulating Bit Fields in C
ITEC113 Algorithms and Programming Techniques
EPSII 59:006 Spring 2004.
Multiple variables can be created in one declaration
Assignment and Arithmetic expressions
More Branch Instructions and Set Instructions
COMP3221: Microprocessors and Embedded Systems
Roller Coaster Design Project
Unit 2 Programming.
  State Encoding مرتضي صاحب الزماني.
Lecture 3 Expressions Richard Gesick.
ARM Control Structures
Embedded Programming in C
Multiplication by small constants (pp. 139 – 140)
Bit Fields & Bitwise Operations
ARM Control Structures
Data Types and Expressions
Programming Language C Language.
Unit 3: Variables in Java
Optimization.
Number Systems and Circuits for Addition
Variables and Constants
Data Types and Expressions
Bit Manipulations CS212.
Data Types and Expressions
Presentation transcript:

OPTIMIZING C CODE FOR THE ARM PROCESSOR Optimizing code takes time and reduces source code readability Usually done for functions that are critical for performance or power consumption and are executed frequently Usually in combination with profiling

LOCAL VARIABLES ARM registers are 32-bit. Therefore it is more efficient to use 32-bit data types Use signed and unsigned integer types and avoid char and short Only exception is if you want wraparound to occur Unsigned int is more efficient for division

LOOP STRUCTURES (incrementing for loop) int checksum_v5(int *data) { unsigned int i; int sum=0; for (i=0; i<64; i++) { sum +=*(data++); } return sum; } checksum_v5 MOV r2,r0; r2=data MOV r0,#0; sum=0 MOV r1,#0; i=0 checksum_v5_loop LDR r3,[r2],#4; r3 = *(data++) ADD r1,r1,#1; i++ CMP r1,#0x40; compare i, 64 ADD r0, r3, r0; sum += r3 BCC checksum_v5_loop ; if (i<64) goto loop MOV pc,r14; return sum

LOOP STRUCTURES (decrementing for loop) int checksum_v6(int *data) { unsigned int i; int sum=0; for (i=64; i!=0; i--) { sum +=*(data++); } return sum; } checksum_v6 MOV r2,r0; r2=data MOV r0,#0; sum=0 MOV r1,#0x40; i=64 checksum_v6_loop LDR r3,[r2],#4; r3 = *(data++) SUBS r1,r1,#1; i-- and set flags ADD r0, r3, r0; sum += r3 BNE checksum_v6_loop ; if (i!=0) goto loop MOV pc,r14; return sum

LOOP UNROLLING int checksum_v7(int *data,unsigned int N) { int sum=0; do { sum +=*(data++); N -=4 } while (N!=0); return sum; } checksum_v7 MOV r2,#0; sum=0 checksum_v6_loop LDR r3,[r2],#4; r3 = *(data++) SUBS r1,r1,#4; N -=4 and set flags ADD r2, r3, r2; sum += r3 LDR r3,[r2],#4; r3 = *(data++) ADD r2, r3, r2; sum += r3 LDR r3,[r2],#4; r3 = *(data++) ADD r2, r3, r2; sum += r3 LDR r3,[r2],#4; r3 = *(data++) ADD r2, r3, r2; sum += r3 BNE checksum_v6_loop ; if (N!=0) goto loop MOV r0,r2; r0 = sum MOV pc,r14; return r0

Loop Unrolling example Unroll the following loop by a factor of 2, 4, and eight for (i=0; i<64; i++) { a[i] = b[i] + c[i+1]; }

Factor of 2 for (i=0; i<32; i++) { a[2*i] = b[2*i] + c[2*i+1]; a[2*i+1] = b[2*i+1] + c[2*i+1+1]; }

Factor of 4 for (i=0; i<16; i++) { a[4*i] = b[4*i] + c[4*i+1]; a[4*i+1] = b[4*i+1] + c[4*i+1+1]; a[4*i+2] = b[4*i+2] + c[4*i+2+1]; a[4*i+3] = b[4*i+3] + c[4*i+3+1]; }

Factor of 8 for (i=0; i<8; i++) { a[8*i] = b[8*i] + c[8*i+1]; a[8*i+1] = b[8*i+1] + c[8*i+1+1]; a[8*i+2] = b[8*i+2] + c[8*i+2+1]; a[8*i+3] = b[8*i+3] + c[8*i+3+1]; a[8*i+4] = b[8*i+4] + c[8*i+4+1]; a[8*i+5] = b[8*i+5] + c[8*i+5+1]; a[8*i+6] = b[8*i+6] + c[8*i+6+1]; a[8*i+7] = b[8*i+7] + c[8*i+7+1]; }

REGISTER ALLOCATION Limit the number of local variables in the internal loop of functions to 12 Use the important variables in the innermost loop to help the compiler

CALLING FUNCTIONS Try to restrict functions to four arguments. Use structures to group related arguments and pass structure pointers instead Define small functions in the same source file and before the functions that call them.

REGISTER ALLOCATION Limit the number of internal loop variables to 12 so they can be stored in registers

SUMMARY Use signed int and unsigned int types for local variables, function arguments and return values The most efficient form of loop is the do-while loop that counts down to zero Unroll important loops Try to limit functions to four arguments. Avoid divisions. Use multiplication by reciprocal Use the inline assembler

ARM INLINE ASSEMBLY int main() { int n1,n2,m; n1=5; n2=3; __asm//inline assembly code { MUL m,n1,n2 } printf("The result is %d\n",m); return(0); }

USING INLINE ASSEMBLY Used for ARM instructions not supported by the C compiler (coprocessor instruction set extensions) Creates portability issues

ALTERNATIVE: CALLING ASSEMBLY FUNCTION FROM C #include extern void multip(int n1, int n2, int m); int main() { int n1,n2,m; n1=5;//Assigning numbers n2=3; multip(n1,n2,m); //calling function printf("The result is\n",m); }

Assembly function AREA example, CODE, READONLY EXPORT multip;external function name IMPORT n1;input IMPORT n2 IMPORT m;return variable Multip;function begins LDR r3,=n1;load data from memory to registers LDR r1,[r3] LDR r4,=n2 LDR r2,[r4] LDR r5,=m LDR r0,[r5] MUL r0,r1,r2 STR r0,[r5];store result to m memory location MOV pc,lr;return from call END

PORTABILITY ISSUES Char type: Unsigned on ARM, signed on many other processors Alignment: ARM lw, sw instructions assume the address is a multiple of the type you are loading or storing Endianess: Little endian (default), can be configured to big endian Inline assembly: Separate inline assembly into small inlined functions

EXAMPLE Write a program that reads 8-element row and column vectors from memory and –Multiplies both by a scalar also found in memory –Calculates the scalar product of the two vectors –Assume no partial product may exceed 32 bits –Use v1= [ ], v2= [ ]T, s=5 as test inputs Unroll the loop by two and four Repeat using inline assembly for the multiplications