TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Slides:



Advertisements
Similar presentations
Part IV: Memory Management
Advertisements

Introduction to Memory Management. 2 General Structure of Run-Time Memory.
R4 Dynamically loading processes. Overview R4 is closely related to R3, much of what you have written for R3 applies to R4 In R3, we executed procedures.
CPU Review and Programming Models CT101 – Computing Systems.
UEE072HM Linking HLL and ALP An example on ARM. Embedded and Real-Time Systems We will mainly look at embedded systems –Systems which have the computer.
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 9 MEMORY MANAGEMENT.
Computer Architecture CSCE 350
The University of Adelaide, School of Computer Science
ITEC 352 Lecture 27 Memory(4). Review Questions? Cache control –L1/L2  Main memory example –Formulas for hits.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Memory Management 2010.
Run time vs. Compile time
Chapter 5: Memory Management Dhamdhere: Operating Systems— A Concept-Based Approach Slide No: 1 Copyright ©2005 Memory Management Chapter 5.
Software Development and Software Loading in Embedded Systems.
System Calls 1.
Operating System Chapter 7. Memory Management Lynn Choi School of Electrical Engineering.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training course describes how to configure the the C/C++ compiler options.
Runtime Environments What is in the memory? Runtime Environment2 Outline Memory organization during program execution Static runtime environments.
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Purpose  This training module provides an overview of optimization techniques used in.
Copyright © 2004 Texas Instruments. All rights reserved. T TO Technical Training Organization 1.Introduction 2.Real-Time System Design Considerations 3.Hardware.
Runtime Environments Compiler Construction Chapter 7.
Stack and Heap Memory Stack resident variables include:
Chapter 0.2 – Pointers and Memory. Type Specifiers  const  may be initialised but not used in any subsequent assignment  common and useful  volatile.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Topic 2d High-Level languages and Systems Software
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Chapter 4 Memory Management Virtual Memory.
DSP/BIOS System Integration Workshop Copyright © 2004 Texas Instruments. All rights reserved. T TO Technical Training Organization 1.Introduction 2.Real-Time.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
This material exempt per Department of Commerce license exception TSU Address Management.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
1 CS503: Operating Systems Spring 2014 Part 0: Program Structure Dongyan Xu Department of Computer Science Purdue University.
CSC 8505 Compiler Construction Runtime Environments.
Processes and Virtual Memory
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  This training course explains how to use section setting and memory.
Basic Memory Management Chapter 3 C6000 Integration Workshop Copyright © 2005 Texas Instruments. All rights reserved. Technical Training Organization T.
LECTURE 13 Names, Scopes, and Bindings: Memory Management Schemes.
By Anand George SourceLens.org Copyright. All rights reserved. Content Owner - Meera R (meera at sourcelens.org)
© 2008, Renesas Technology America, Inc., All Rights Reserved 1 Introduction Purpose  This training course demonstrates the Project Generator function.
Memory Management. 2 How to create a process? On Unix systems, executable read by loader Compiler: generates one object file per source file Linker: combines.
Memory Management Chapter 5 Advanced Operating System.
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Lecture 3 Translation.
Chapter 14 Functions.
Storage Allocation Mechanisms
Software Development with uMPS
Chapter 14 Functions.
Main Memory Management
Static Systems (GCONF, TCONF)
Main Memory Background Swapping Contiguous Allocation Paging
CS399 New Beginnings Jonathan Walpole.
Memory Allocation CS 217.
The University of Adelaide, School of Computer Science
Operating System Chapter 7. Memory Management
Runtime Environments What is in the memory?.
COMP755 Advanced Operating Systems
Chapter 14 Functions.
CSE 542: Operating Systems
Topic 2b ISA Support for High-Level Languages
Chapter 14 Functions.
Page Main Memory.
Presentation transcript:

TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training Organization T TO

Outline   Using Memory Efficiently   Keep it on-chip   Use multiple sections   Use local variables (stack)   Using dynamic memory (heap, BUF)   Overlay memory (load vs. run)   Use cache   Summary

Keep it On-Chip Internal SRAM CPU Program Cache Data Cache EMIF.text.bss Using Memory Efficiently 1.If Possible … Put all code / data on-chip   Best performance   Easiest to implement What if it doesn’t all fit? Technical Training Organization T TO

How to use Internal Memory Efficiently Keep it on-chip Use multiple sections Use local variables (stack) Using dynamic memory (heap, BUF) Overlay memory (load vs. run) Use cache Technical Training Organization T TO

Use Multiple Sections Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently 2.Use Multiple Sections   Keep.bss (global vars) and critical code on-chip   Put non-critical code and data off-chip.text.bss.far critical myVar Technical Training Organization T TO

Making Custom Code Sections #pragma CODE_SECTION(dotp, “critical”); int dotp(a, x)   Create custom code section using #pragma CODE_SECTION(dotp, “.text:_dotp”);   Use the compiler’s –mo option   -mo creates a subsection for each function   Subsections are specified with “:” To make a data section... Technical Training Organization T TO

Making Custom Data Sections A special data section... #pragma DATA_SECTION (x, “myVar”); #pragma DATA_SECTION (y, “myVar”); int x[32]; short y;   Make custom named data section Technical Training Organization T TO

Special Data Section: “.far” #pragma DATA_SECTION(m, “.far”) short m;  .far is a pre-defined section name   Three cycle read (pointer must be set before read)   Add variable to.far using: Use DATA_SECTION pragma Far compiler option Far keyword: How do we link our own sections? -ml far short m; Technical Training Organization T TO

Linking Custom Sections app.cdb Linker appcfg.cmd myApp.out “Build” How do I know which CMD file is executed first? myLink.cmd SECTIONS { myVar: > SDRAM critical: > IRAM.text:_dotp:> IRAM } Technical Training Organization T TO

Specifying Link Order What if I forget to specify a section in SECTIONS? Technical Training Organization T TO

Check for Unspecified Sections In summary … Technical Training Organization T TO

Use Multiple Sections Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently.text.bss.far critical myVar 2.Use Multiple Sections   Keep.bss (global vars) and critical code on-chip   Put non-critical code and data off-chip   Create new sections with: #pragma CODE_SECTION #pragma DATA_SECTION   You must make your own linker command file Technical Training Organization T TO

Using Memory Efficiently Keep it on-chip Use multiple sections Use local variables (stack) Using dynamic memory (heap, BUF) Overlay memory (load vs. run) Use cache Technical Training Organization T TO

Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently 3.Local Variables   If stack is located on-chip, all functions can “share” it Stack What is a stack? Technical Training Organization T TO

C Register Usage arg1/r_val arg3 arg5 arg7 arg9 A ret addr arg2 arg4 arg6 arg8 arg10 DP B SP extra arguments Stack Prior Stack Contents Another use of dynamic memory... Technical Training Organization T TO

Top of Stack 0 0xFFFFFFFF What is the Stack A block of memory where the compiler stores:   Local variables   Intermediate results   Function arguments   Return addresses Details of the C6000 stack... Technical Training Organization T TO

(lower) (higher) stack grows Details:1.SP points to first empty location 2.SP is double-word aligned before each fcn 3.Created by Compiler’s init routine (boot.c) 4.Length defined by -stack Linker option 5.Stack length is not validated at runtime SP B15 Top of Stack 0 0xFFFFFFFF Stack and Stack Pointer Technical Training Organization T TO

How would you PUSH “A1” to the stack? (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

How would you PUSH “A1” to the stack? STW A1, *SP--[1] (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? LDW *++SP[1], A1 (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? LDW *++SP[1], A1 (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm Will these PUSH/POP's keep the stack double-aligned? Technical Training Organization T TO

  Only move SP to 8-byte boundaries   Move SP (to create a local frame), then Use offset addressing to fill-in PUSHed values   May leave a small “hole”, but alignment is critical ; PUSH nine registers -- “A0” thru “A8” SP.equ B15 STW A0, *SP--[10] ; STW A1, *+SP[9] STW A2, *+SP[8] STW A3, *+SP[7] STW A4, *+SP[6] STW A5, *+SP[5] STW A6, *+SP[4] STW A7, *+SP[3] STW A8, *+SP[2] Using the Stack in Assembly New SP  8Byte boundry A8 8Byte boundry A7 A6 8Byte boundry A5 A4 8Byte boundry A3 A2 8Byte boundry A1 Original SP  A0 8Byte boundry x32 LE Example:

Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently Stack Heap 3.Local Variables   If stack is located on-chip, all functions can use it 4.Use the Heap   Common memory reuse within C language   A Heap (ie. system memory) allocate, then free chunks of memory from a common system block For example … Technical Training Organization T TO

Dynamic Example (Heap) #define SIZE 32 int x[SIZE];/*allocate*/ int a[SIZE]; x={…};/*initialize*/ a={…}; filter(…);/*execute*/ “Normal” (static) C Coding #define SIZE 32 x=malloc(SIZE); a=malloc(SIZE); x={…}; a={…}; filter(…); free(a); free(x); “Dynamic” C Coding Create Execute Delete   High-performance DSP users have traditionally used static embedded systems   As DSPs and compilers have improved, the benefits of dynamic systems often allow enhanced flexibility (more threads) at lower costs

Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently Stack Heap 3.Local Variables   If stack is located on-chip, all functions can use it 4.Use the Heap   Common memory reuse within C language   A Heap (ie. system memory) can be allocated, then free’d What if I need two heaps?   Say, a big image array off-chip, and   Fast scratch memory heap on-chip? What if I need two heaps?   Say, a big image array off-chip, and   Fast scratch memory heap on-chip? Technical Training Organization T TO

Multiple Heaps Internal SRAM CPU Program Cache Data Cache EMIF External Memory Stack Heap Heap2   DSP/BIOS enables multiple heaps to be created

Multiple Heaps - Summary Internal SRAM CPU Program Cache Data Cache EMIF External Memory Stack Heap Heap2   DSP/BIOS enables multiple heaps to be created   Create and name heaps in configuration tool (GUI)   Use MEM_alloc() function to allocate memory and specify which heap Technical Training Organization T TO

Multiple Heaps with DSP/BIOS   DSP/BIOS enables multiple heaps to be created   Check the box & set the size when creating a MEM object

Multiple Heaps with DSP/BIOS   DSP/BIOS enables multiple heaps to be created   Check the box & set the size when creating a MEM object   By default, the heap has the same name as the MEM obj, You can change it here How can you allocate from multiple heaps? Technical Training Organization T TO

MEM_alloc() #define SIZE 32 x = MEM_alloc(IRAM, SIZE, ALIGN); a = MEM_alloc(SDRAM, SIZE, ALIGN); x = {…}; a = {…}; filter(…); MEM_free(SDRAM,a,SIZE); MEM_free(IRAM,x,SIZE); Using MEM functions #define SIZE 32 x=malloc(SIZE); a=malloc(SIZE); x={…}; a={…}; filter(…); free(a); free(x); Standard C syntax You can pick a specific heap Technical Training Organization T TO

BUF Concepts   Buffer pools contain a specified number of equal size buffers   Any number of pools can be created   Buffers are allocated from a pool and freed back when no longer needed   Buffers can be shared between applications   Buffer pool API are faster and smaller than malloc-type operations   In addition, BUF_alloc and BUF_free are deterministic (unlike malloc)   BUF API have no reentrancy or fragmentation issues POOL BUF SWI BUF_alloc BUF TSK BUF_free BUF BUF_createBUF_delete Technical Training Organization T TO

GCONF Creation of Buffer Pool Creating a BUF 1. right click on BUF mgr 2. select “insert BUF” 3. right click on new BUF 4. select “rename” 5. type BUF name 6. right click on new BUF 7. select “properties” 8. indicate desired Memory segment Number of buffers Size of buffers Alignment of buffers Gray boxes indicate effective pool and buffer sizes Technical Training Organization T TO

Using Memory Efficiently Keep it on-chip Use multiple sections Use local variables (stack) Using dynamic memory (heap, BUF) Overlay memory (load vs. run) Use cache Technical Training Organization T TO

Use Memory Overlays Internal SRAM CPU Program Cache Data Cache EMIF External Memory algo2 algo1 Using Memory Efficiently 5.Use Memory Overlays   Reuse the same memory locations for multiple algorithms (and/or data)   You must copy the sections yourself First, we need to make custom sections? Technical Training Organization T TO

Create Sections to Overlay #pragma CODE_SECTION(fir, “.FIR”); int fir(short *a, …) #pragma CODE_SECTION(iir, “myIIR”); int iir(short *a, …) myCode.C   How can we get them to run from the same location?   Where will they be originally loaded into memory?   The key is in the linker command file … Technical Training Organization T TO

Load vs. Run Addresses SECTIONS {.FIR:> IRAM /*load & run*/ myIIR: load=IRAM, run=IRAM Internal SRAM External Memory.fir myIIR   Simply directing a section into a MEM obj indicates it’s both the load & run from the same location.FIR:> IRAM   Alternatively, you could use:.FIR: load=IRAM, run=IRAM   In your own linker cmd file: load:where the fxn resides at reset run:tells linker its runtime location What if we wanted them be loaded to off-chip but run from on-chip memory?

Load vs. Run Addresses   Simply specify different addresses for load and run   You must make sure they get copied (using the memcopy or the DMA) load addresses run addresses   load: where the fxn resides at reset   run: tells linker its runtime location SECTIONS {.FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM Internal SRAM External Memory.FIR myIIR Back to our original problem, what if we want them to run from the same address?

Setting Load Addresses in GCONF

SECTIONS {.FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM Combining Run Addresses with UNION   Above, we only force different load/run   Below, we also force them to share (union) run locations load addresses run addresses SECTIONS { UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } Internal SRAM External Memory How can we make the overlay procedure easier?

  First, create a section for each function   In your own linker cmd file:   load: where the fxn resides at reset   run: tells linker its runtime location   UNION forces both functions to be runtime linked to the same memory addresses (ie. overlayed)   You must move it with CPU or DMA Overlay Memory myLnk.CMD SECTIONS {.bss:> IRAM /*load & run*/ UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } SECTIONS {.bss:> IRAM /*load & run*/ UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } #pragma CODE_SECTION(fir, “.FIR”); int fir(short *a, …) #pragma CODE_SECTION(iir, “myIIR”); int iir(short *a, …) myCode.C

SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables

SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables typedef struct copy_record { unsigned int load_addr; unsigned int run_addr; unsigned int size; } COPY_RECORD; typedef struct copy_table { unsigned short rec_size; unsigned short num_recs; COPY_RECORD recs[2]; } COPY_TABLE; fir_copy_table3 1 fir load addr copy record fir run addr fir size iir_copy_table3 1 iir load addr copy record iir run addr iir size How do we use a Copy Table? Technical Training Organization T TO

SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables #include extern far COPY_TABLE fir_copy_table; extern far COPY_TABLE iir_copy_table; extern void fir(void); extern void iir(void); main() { copy_in(&fir_copy_table); fir();... copy_in(&iir_copy_table); iir();... }   copy_in() provides a simple wrapper around mem_copy().   Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.   copy_in() provides a simple wrapper around mem_copy().   Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function. What could be even easier than using Copy Tables?

Copy Table Header File /**************************************************************************/ /* cpy_tbl.h */ /* Specification of copy table data structures which can be automatically */ /* generated by the linker (using the table() operator in the LCF). */ /**************************************************************************/ /* Copy Record Data Structure */ /**************************************************************************/ typedef struct copy_record { unsigned int load_addr; unsigned int run_addr; unsigned int size; } COPY_RECORD; /**************************************************************************/ /* Copy Table Data Structure */ /**************************************************************************/ typedef struct copy_table { unsigned short rec_size; unsigned short num_recs; COPY_RECORD recs[1]; } COPY_TABLE; /**************************************************************************/ /* Prototype for general purpose copy routine. */ /*************************************************************************/ extern void copy_in(COPY_TABLE *tp); Technical Training Organization T TO

Copy Table (Group & Union) SECTIONS {... UNION { GROUP {.task1: { task1.obj(.text) }.task2: { task2.obj(.text) } } load = ROM, table(_task12_copy_table) GROUP {.task3: { task3.obj(.text) }.task4: { task4.obj(.text) } } load = ROM, table(_task34_copy_table) } run = RAM... } Load Run 12 or 34 Technical Training Organization T TO

Use Memory Overlays Internal SRAM CPU Program Cache Data Cache EMIF External Memory algo2 algo1 Using Memory Efficiently 5.Use Memory Overlays   Reuse the same memory locations for multiple algorithms (and/or data)   Either CPU or DMA must copy the info from its original ( load ) location to the run location Is there an easier way to overlay code & data memory? Technical Training Organization T TO

Use Cache Internal Cache CPU Program Cache Data Cache EMIF External Memory.bss.text Using Memory Efficiently 6.Use Cache   Works for Code and Data   Keeps local (temporary) scratch copy of info on-chip   Commonly used, since once enabled it’s automatic   Discussed further in Chapter 14 Technical Training Organization T TO

Summary: Using Memory Efficiently   You may want to work through your memory allocations in the following order: Keep it all on-chip Use Cache(more in Ch 15) Use local variables (stack on-chip) Using dynamic memory (heap, BUF) Make your own sections(pragma’s) Overlay memory(load vs. run)   While this tradeoff is highly application dependent, this is a good place to start Technical Training Organization T TO

ti Technical Training Organization