Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.

Slides:



Advertisements
Similar presentations
ARM versions ARM architecture has been extended over several versions.
Advertisements

COMP3221 lec16-function-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 16 : Functions in C/ Assembly - II
1 ARM Movement Instructions u MOV Rd, ; updates N, Z, C Rd = u MVN Rd, ; Rd = 0xF..F EOR.
Lab III Real-Time Embedded Operating System for a SoC System.
Run-time Environment for a Program different logical parts of a program during execution stack – automatically allocated variables (local variables, subdivided.
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Load and store instruction.
UEE072HM Linking HLL and ALP An example on ARM. Embedded and Real-Time Systems We will mainly look at embedded systems –Systems which have the computer.
Introduction to Embedded Systems Intel Xscale® Assembly Language and C Lecture #3.
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
CONDITION CODE AND ARITHMETIC OPERATIONS – Microprocessor Asst. Prof. Dr. Choopan Rattanapoka and Asst. Prof. Dr. Suphot Chunwiphat.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
Computertechniek Hogeschool van Utrecht / Institute for Computer, Communication and Media Technology 1.
ARM Microprocessor “MIPS for the Masses”.
Multiple data transfer instructions ARM also supports multiple loads and stores: ldm/ldmia/ldmfd: load multiple registers starting from [base register],
COMP3221 lec08-arith.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 8: C/Assembler Data Processing
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
ARM programmer’s model and assembler Embedded Systems Programming.
Topics covered: ARM Instruction Set Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
ARM 7 Datapath. Has “BIGEND” input bit, which defines whether the memory is big or little endian Modes: ARM7 supports six modes of operation: (1) User.
ARM Instructions I Prof. Taeweon Suh Computer Science Education Korea University.
The ARM Programmer’s Model
Embedded System Design Center Sai Kumar Devulapalli ARM7TDMI Microprocessor Thumb Instruction Set.
Exception and Interrupt Handling
Subroutines and Stacks 1. Subroutines Separate, independent module of program, performs a specific task shortens code, provide reusable “tools” High-level.
ARM Assembly Programming Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/19 with slides by Peng-Sheng Chen.
Lecture 4. ARM Instructions #1 Prof. Taeweon Suh Computer Science Education Korea University ECM586 Special Topics in Embedded Systems.
Lecture 4. ARM Instructions Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
6.S078 - Computer Architecture: A Constructive Approach Introduction to SMIPS Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts.
ARM7TDMI Processor. 2 The ARM7TDMI processor is a member of the Advanced RISC machine family of general purpose 32-bit microprocessor What does mean ARM7TDMI.
1 Chapter 4 ARM Assembly Language Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
ARM Assembly Programming II Computer Organization and Assembly Languages Yung-Yu Chuang 2007/11/26 with slides by Peng-Sheng Chen.
AT91 C-startup. 2 For reasons of modularity and portability most application code for an embedded application is written in C The application entry point.
1 TM 1 Embedded Systems Lab./Honam University r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 (sp) r14 (lr) r15 (pc) cpsr r13 (sp) r14 (lr) spsr r13 (sp)
Assembly Variables: Registers Unlike HLL like C or Java, assembly cannot use variables – Why not? Keep Hardware Simple Assembly Operands are registers.
Introduction to ARM processor. Intro.. ARM founded in November 1990 Advanced RISC Machines Company headquarters in Cambridge, UK Processor design centers.
ARM7 TDMI INTRODUCTION.
Instruction Set Architectures Early trend was to add more and more instructions to new CPUs to do elaborate operations –VAX architecture had an instruction.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
ARM Instruction Set Computer Organization and Assembly Languages Yung-Yu Chuang with slides by Peng-Sheng Chen.
Lecture 6: Decision and Control CS 2011 Spring 2016, Dr. Rozier.
Multiple data transfer instructions ARM also supports multiple loads and stores: When the data to be copied to the stack is known to be a multiple of 4.
Intel Xscale® Assembly Language and C. The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will.
Chapter 12 Processor Structure and Function. Central Processing Unit CPU architecture, Register organization, Instruction formats and addressing modes(Intel.
Writing Functions in Assembly
Chapter 4: Introduction to Assembly Language Programming
Data in Memory variables have multiple attributes symbolic name
Assembly language.
ECE 3430 – Intro to Microcomputer Systems
Introduction to the ARM Instruction Set
ARM Registers Register – internal CPU hardware device that stores binary data; can be accessed much more rapidly than a location in RAM ARM has.
ECE 3430 – Intro to Microcomputer Systems
The Cortex-M3/m4 Embedded Systems: Cortex-M3/M4 Instruction Sets
Chapter 4 Addressing modes
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
William Stallings Computer Organization and Architecture 8th Edition
Writing Functions in Assembly
ARM Assembly Programming
Instructions - Type and Format
Stack Frame Linkage.
Real-Time Embedded Operating System for a SoC System
ARM Load/Store Instructions
Optimizing ARM Assembly
ARM Introduction.
Overheads for Computers as Components 2nd ed.
10/6: Lecture Topics C Brainteaser More on Procedure Call
Multiply Instructions
Introduction to Assembly Chapter 2
An Introduction to the ARM CORTEX M0+ Instructions
Presentation transcript:

Intel Xscale® Assembly Language and C

The Intel Xscale® Programmer’s Model (1) (We will not be using the Thumb instruction set.) Memory Formats –We will be using the Little Endian format the lowest numbered byte of a word is considered the word’s least significant byte, and the highest numbered byte is considered the most significant byte. Instruction Length –All instructions are 32-bits long. (ARM instructions) Data Types –8-bit bytes and 32-bit words. Processor Modes (of interest) –User: the “normal” program execution mode. –IRQ: used for general-purpose interrupt handling. –Supervisor: a protected mode for the operating system.

The Intel Xscale® Programmer’s Model (2) The Intel Xscale® Register Set –Registers R0-R15 + CPSR (Current Program Status Register) –R13 : Stack Pointer –R14 : Link Register –R15 : Program Counter where bits 0:1 are ignored Program Status Registers –CPSR (Current Program Status Register) holds info about the most recently performed ALU operation –contains N (negative), Z (zero), C (Carry) and V (oVerflow) bits controls the enabling and disabling of interrupts sets the processor operating mode –SPSR (Saved Program Status Registers) used by exception handlers Exceptions –reset, undefined instruction, SWI, IRQ.

Intro to Intel Xscale® Assembly Language “Load/store” architecture 32-bit instructions 32-bit and 8-bit data types 32-bit addresses 37 registers (30 general-purpose registers, 6 status registers and a PC) –only a subset is accessible at any point in time Load and store multiple instructions No instruction to move a 32-bit constant to a register (why?) Conditional execution Barrel shifter –scaled addressing, multiplication by a small constant, and ‘constant’ generation Co-processor instructions (we will not use these)

Intel Xscale® Assembly Language Basics Conditional Execution The Intel Xscale® Barrel Shifter Loading Constants into Registers Loading Addresses into Registers Jump Tables Using the Load and Store Multiple Instructions Check out Chapters 1 through 5 of the ARM Architecture Reference Manual

Generating Assembly Language Code from C Use the command-line option –S. –When you compile a.c file, you get a.s file –This.s file contains the assembly language code generated by the compiler When assembled, this code can potentially be linked and loaded as an executable

Register Names and Use Register #APCS NameAPCS Role R0 a1 argument 1 R1 a2 argument 2 R2 a3 argument 3 R3 a4 argument 4 R4..R8 v1..v5 register variables R9 sb/v6 static base/register variable R10 sl/v7 stack limit/register variable R11 fp frame pointer R12 ip scratch reg/ new­sb in inter­link­unit calls R13 sp low end of current stack frame R14 lr link address/scratch register R15 pc program counter

“Frame Pointer” foo: MOV ip, sp STMDB sp!,{a1­a3, fp, ip, lr, pc} LDMDB fp,{fp, sp, pc} pc lr ip fp address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 fp 1 a3 a2 a1 1 ipSP frame pointer (fp) points to the top of stack for function

The Frame Pointer fp points to top of the stack area for the current function –Or zero if not being used By using the frame pointer and storing it at the same offset for every function call, it creates a singly­linked list of activation records Creating the stack “backtrace” structure MOV ip, sp STMFD sp!,{a1­a4,v1­ v5,sb,sl,fp,ip,sp, lr,pc} SUB fp, ip, #4 pc lr SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 ip fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP after FP after sp

How Does STM Place Things into Memory ? STM sp!, {r0­r15} The XScale processor uses a bit-vector to represent each register to be saved The architecture places the lowest number register into the lowest address Default STM == STMDB pc lr sp SP before address 0x90 0x8c 0x88 0x84 0x80 0x7c 0x78 0x74 0x70 0x6c 0x68 0x64 0x60 0x5c 0x58 0x54 0x50 ip fp v7 v6 v5 v4 v3 v2 v1 a4 a3 a2 a1 SP after

Example 1: A Simple Program int a,b; int main() { a = 3; b = 4; } /* end main() */.text /*section declaration*/.align 2.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b STMFD ­ store multiple, full descending sp  sp ­ 4 mem[sp] = pc ; program counter sp  sp – 4 mem[sp] = lr ; link register sp  sp – 4 mem[sp] = ip ; new stack base sp  sp – 4 mem[sp] = fp ; frame pointer LDMFD ­ load multiple, full descending fp = mem[sp] (fp) ; frame pointer sp  sp + 4 sp = mem[sp] (ip) ; stack pointer sp  sp + 4 pc = mem[sp] (lr) ; program counter

Example 2: Calling A Function int tmp, a, b; void swap(int a, int b); int main() { a = 3; b = 4; swap(a,b); } /* end main() */ void swap(int a,int b) { tmp = a; a = b; b = tmp; } /* end swap() */.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ ldr r3,.L2 ldr r2,.L2+4 ldr r0, [r3, #0] /* a */ ldr r1, [r2, #0] /* b */ bl swap /* function call */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b

Example 2: Calling A Function (Cont’d).global swap.type swap, %function swap: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #8 str r0, [fp, #-16] /* a */ str r1, [fp, #-20] /* b */ ldr r2,.L5 /* r2 = &tmp */ ldr r3, [fp, #-16] /* r3 = a */ str r3, [r2, #0] /* tmp = a */ ldr r2, [fp, #-20] /* r2 = b */ str r3, [fp, #-16] /* a */ ldr r3,.L5 ldr r3, [r3, #0] /* tmp */ ldr r3, [fp, #-20] /* r3 = b */ str r3, [fp, #-16] /* a = b */ ldr r3,.L5 ldr r3, [r3, #0] /* tmp */ str r3, [fp, #-20] /* b = tmp */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} /*return*/.L6:.align 2.L5:.word tmp void swap(int a,int b) { tmp = a; a = b; b = tmp; } /* end swap() */

Example 3: Manipulating Pointers int tmp; int a, b; void swap (int *a, int *b); int main() { a = 3; b = 4; swap(&a, &b); } /* end main() */ void swap(int *a,int *b) { tmp = *a; *a = *b; *b = tmp; } /* end swap() */.global main /*export entry point*/.type main, %function main: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 ldr r2,.L2 mov r3, #3 str r3, [r2, #0] /* a=3 */ ldr r2,.L2+4 mov r3, #4 str r3, [r2, #0] /* b=4 */ ldr r3,.L2 ldr r2,.L2+4 bl swap /* function call */ mov r0, r3 ldmfd sp, {fp, sp, pc} /*return*/.L3:.align 2.L2:.word a.word b

Example 3 (cont’d).global swap.type swap, %function swap: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #8 str r0, [fp, #-16] /* &a */ str r1, [fp, #-20] /* &b */ ldr r2,.L5 /* r2 = &tmp */ ldr r3, [fp, #-16] /* r3 = &a */ ldr r3, [r3, #0] /* r3 = a */ str r3, [r2, #0] /* tmp = a */ ldr r2, [fp, #-16] /* r2 = &a */ ldr r3, [fp, #-20] /* r3 = &b */ ldr r3, [r3, #0] /* r3 = b */ str r3, [r2, #0] /* a = b */ ldr r2, [fp, #-20] /* r2 = &b */ ldr r3,.L5 ldr r3, [r3, #0] /* r3 = tmp */ str r3, [r2, #0] /* b = tmp */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} /*return*/.L6:.align 2.L5:.word tmp void swap(int *a,int *b) { tmp = *a; *a = *b; *b = tmp; } /* end swap() */

Example 4: Dealing with Lots of Arguments int tmp; void test(int a, int b, int c, int d, int *e); int main() { int a, b, c, d, e; a = 3; b = 4; c = 5; d = 6; e = 7; test(a, b, c, d, &e); } /* end main() */ void test(int a,int b, int c, int d, int *e) { tmp = a; a = b; b = tmp; c = b; b = d; *e = d; } /* end test() */ main: mov ip, sp stmfd sp!,{fp,ip,lr,pc} sub fp, ip, #4 sub sp, sp, #24 mov r3, #3 str r3, [fp, #-16] mov r3, #4 str r3, [fp, #-20] mov r3, #5 str r3, [fp, #-24] mov r3, #6 str r3, [fp, #-28] mov r3, #7 str r3, [fp, #-32] sub r3, fp, #32 str r3, [sp, #0] /* &e */ ldr r0, [fp, #-16] /* a */ ldr r1, [fp, #-20] /* b */ ldr r2, [fp, #-24] /* c */ ldr r3, [fp, #-28] /* d */ bl test mov r0, r3 sub sp, fp, #12 ldmfd sp, {fp, sp, pc}

Example 4 (cont’d) test: mov ip, sp stmfd sp!, {fp, ip, lr, pc} sub fp, ip, #4 sub sp, sp, #16 str r0, [fp, #-16] str r1, [fp, #-20] str r2, [fp, #-24] str r3, [fp, #-28] ldr r2,.L3 /* tmp */ ldr r3, [fp, #-16] str r3, [r2, #0] /* tmp = a */ ldr r3, [fp, #-20] str r3, [fp, #-16] /* a = b */ ldr r3,.L3 ldr r3, [r3, #0] str r3, [fp, #-20] /* b = tmp */ ldr r3, [fp, #-20] str r3, [fp, #-24] /* c = b */ ldr r3, [fp, #-28] str r3, [fp, #-20] /* b = d */ ldr r2, [fp, #4] ldr r3, [fp, #-28] str r3, [r2, #0] /* *e = d */ sub sp, fp, #12 ldmfd sp, {fp, sp, pc} d c b a fp ip lr pc e fp ip sp 88 9c c c

Mixing C and Assembly Language XScale Assembly Code C Library C Source Code XScale Executable Compiler Linker Assembler

Interfacing C and Assembly Language ARM (the has developed a standard called the “ARM Procedure Call Standard” (APCS) which defines: –constraints on the use of registers –stack conventions –format of a stack backtrace data structure –argument passing and result return –support for ARM shared library mechanism Compiler­generated code conforms to the APCS –It's just a standard ­ not an architectural requirement –Cannot avoid standard when interfacing C and assembly code –Can avoid standard when just writing assembly code or when writing assembly code that isn't called by C code

Multiply Multiply instruction can take multiple cycles –Can convert Y * Constant into series of adds and shifts –Y * 9 = Y * 8 + Y * 1 –Assume R1 holds Y and R2 will hold the result ADD R2, R2, R1, LSL #3 ; multiplication by 9: (Y * 8) + (Y * 1) RSB R2, R1, R1, LSL #3 ; multiplication by 7: (Y * 8) - (Y * 1) (RSB: reverse subtract - operands to subtraction are reversed) Another example: Y * 105 –105 = 128 ­ 23 = 128 ­ (16 + 7) = 128 ­ (16 + (8 ­ 1)) RSB r2, r1, r1, LSL #3 ; r2 <­­ Y*7 = Y*8 ­ Y*1(assume r1 holds Y) ADD r2, r2, r1, LSL #4 ; r2 <­­ r2 + Y * 16 (r2 held Y*7; now holds Y*23) RSB r2, r2, r1, LSL #7 ; r2 <­­ (Y * 128) ­ r2 (r2 now holds Y*105) Or Y * 105 = Y * (15 * 7) = Y * (16 ­ 1) * (8 ­ 1) RSB r2,r1,r1,LSL #4 ; r2 <­­ (r1 * 16)­ r1 RSB r3, r2, r2, LSL #3 ; r3 <­­ (r2 * 8)­ r2