Presentation is loading. Please wait.

Presentation is loading. Please wait.

TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training.

Similar presentations


Presentation on theme: "TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training."— Presentation transcript:

1 TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training Organization T TO

2 Outline   Using Memory Efficiently   Keep it on-chip   Use multiple sections   Use local variables (stack)   Using dynamic memory (heap, BUF)   Overlay memory (load vs. run)   Use cache   Summary

3 Keep it On-Chip Internal SRAM CPU Program Cache Data Cache EMIF.text.bss Using Memory Efficiently 1.If Possible … Put all code / data on-chip   Best performance   Easiest to implement What if it doesn’t all fit? Technical Training Organization T TO

4 How to use Internal Memory Efficiently 1. 1. Keep it on-chip 2. 2. Use multiple sections 3. 3. Use local variables (stack) 4. 4. Using dynamic memory (heap, BUF) 5. 5. Overlay memory (load vs. run) 6. 6. Use cache Technical Training Organization T TO

5 Use Multiple Sections Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently 2.Use Multiple Sections   Keep.bss (global vars) and critical code on-chip   Put non-critical code and data off-chip.text.bss.far critical myVar Technical Training Organization T TO

6 Making Custom Code Sections #pragma CODE_SECTION(dotp, “critical”); int dotp(a, x)   Create custom code section using #pragma CODE_SECTION(dotp, “.text:_dotp”);   Use the compiler’s –mo option   -mo creates a subsection for each function   Subsections are specified with “:” To make a data section... Technical Training Organization T TO

7 Making Custom Data Sections A special data section... #pragma DATA_SECTION (x, “myVar”); #pragma DATA_SECTION (y, “myVar”); int x[32]; short y;   Make custom named data section Technical Training Organization T TO

8 Special Data Section: “.far” #pragma DATA_SECTION(m, “.far”) short m;  .far is a pre-defined section name   Three cycle read (pointer must be set before read)   Add variable to.far using: 1. 1. Use DATA_SECTION pragma 2. 2. Far compiler option 3. 3. Far keyword: How do we link our own sections? -ml far short m; Technical Training Organization T TO

9 Linking Custom Sections app.cdb Linker appcfg.cmd myApp.out “Build” How do I know which CMD file is executed first? myLink.cmd SECTIONS { myVar: > SDRAM critical: > IRAM.text:_dotp:> IRAM } Technical Training Organization T TO

10 Specifying Link Order What if I forget to specify a section in SECTIONS? Technical Training Organization T TO

11 Check for Unspecified Sections In summary … Technical Training Organization T TO

12 Use Multiple Sections Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently.text.bss.far critical myVar 2.Use Multiple Sections   Keep.bss (global vars) and critical code on-chip   Put non-critical code and data off-chip   Create new sections with: #pragma CODE_SECTION #pragma DATA_SECTION   You must make your own linker command file Technical Training Organization T TO

13 Using Memory Efficiently 1. 1. Keep it on-chip 2. 2. Use multiple sections 3. 3. Use local variables (stack) 4. 4. Using dynamic memory (heap, BUF) 5. 5. Overlay memory (load vs. run) 6. 6. Use cache Technical Training Organization T TO

14 Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently 3.Local Variables   If stack is located on-chip, all functions can “share” it Stack What is a stack? Technical Training Organization T TO

15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 C Register Usage arg1/r_val arg3 arg5 arg7 arg9 A ret addr arg2 arg4 arg6 arg8 arg10 DP B SP extra arguments Stack Prior Stack Contents Another use of dynamic memory... Technical Training Organization T TO

16 Top of Stack 0 0xFFFFFFFF What is the Stack A block of memory where the compiler stores:   Local variables   Intermediate results   Function arguments   Return addresses Details of the C6000 stack... Technical Training Organization T TO

17 (lower) (higher) stack grows Details:1.SP points to first empty location 2.SP is double-word aligned before each fcn 3.Created by Compiler’s init routine (boot.c) 4.Length defined by -stack Linker option 5.Stack length is not validated at runtime SP B15 Top of Stack 0 0xFFFFFFFF Stack and Stack Pointer Technical Training Organization T TO

18 How would you PUSH “A1” to the stack? (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

19 How would you PUSH “A1” to the stack? STW A1, *SP--[1] (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

20 How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

21 How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? LDW *++SP[1], A1 (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm

22 How would you PUSH “A1” to the stack? STW A1, *SP--[1] How about POPing A1? LDW *++SP[1], A1 (lower) (higher) stack grows SP B15 Top of Stack 0 0xFFFFFFFF Using the Stack in Asm Will these PUSH/POP's keep the stack double-aligned? Technical Training Organization T TO

23   Only move SP to 8-byte boundaries   Move SP (to create a local frame), then Use offset addressing to fill-in PUSHed values   May leave a small “hole”, but alignment is critical ; PUSH nine registers -- “A0” thru “A8” SP.equ B15 STW A0, *SP--[10] ; STW A1, *+SP[9] STW A2, *+SP[8] STW A3, *+SP[7] STW A4, *+SP[6] STW A5, *+SP[5] STW A6, *+SP[4] STW A7, *+SP[3] STW A8, *+SP[2] Using the Stack in Assembly New SP  8Byte boundry A8 8Byte boundry A7 A6 8Byte boundry A5 A4 8Byte boundry A3 A2 8Byte boundry A1 Original SP  A0 8Byte boundry x32 LE Example:

24 Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently Stack Heap 3.Local Variables   If stack is located on-chip, all functions can use it 4.Use the Heap   Common memory reuse within C language   A Heap (ie. system memory) allocate, then free chunks of memory from a common system block For example … Technical Training Organization T TO

25 Dynamic Example (Heap) #define SIZE 32 int x[SIZE];/*allocate*/ int a[SIZE]; x={…};/*initialize*/ a={…}; filter(…);/*execute*/ “Normal” (static) C Coding #define SIZE 32 x=malloc(SIZE); a=malloc(SIZE); x={…}; a={…}; filter(…); free(a); free(x); “Dynamic” C Coding Create Execute Delete   High-performance DSP users have traditionally used static embedded systems   As DSPs and compilers have improved, the benefits of dynamic systems often allow enhanced flexibility (more threads) at lower costs

26 Dynamic Memory Internal SRAM CPU Program Cache Data Cache EMIF External Memory Using Memory Efficiently Stack Heap 3.Local Variables   If stack is located on-chip, all functions can use it 4.Use the Heap   Common memory reuse within C language   A Heap (ie. system memory) can be allocated, then free’d What if I need two heaps?   Say, a big image array off-chip, and   Fast scratch memory heap on-chip? What if I need two heaps?   Say, a big image array off-chip, and   Fast scratch memory heap on-chip? Technical Training Organization T TO

27 Multiple Heaps Internal SRAM CPU Program Cache Data Cache EMIF External Memory Stack Heap Heap2   DSP/BIOS enables multiple heaps to be created

28 Multiple Heaps - Summary Internal SRAM CPU Program Cache Data Cache EMIF External Memory Stack Heap Heap2   DSP/BIOS enables multiple heaps to be created   Create and name heaps in configuration tool (GUI)   Use MEM_alloc() function to allocate memory and specify which heap Technical Training Organization T TO

29 Multiple Heaps with DSP/BIOS   DSP/BIOS enables multiple heaps to be created   Check the box & set the size when creating a MEM object

30 Multiple Heaps with DSP/BIOS   DSP/BIOS enables multiple heaps to be created   Check the box & set the size when creating a MEM object   By default, the heap has the same name as the MEM obj, You can change it here How can you allocate from multiple heaps? Technical Training Organization T TO

31 MEM_alloc() #define SIZE 32 x = MEM_alloc(IRAM, SIZE, ALIGN); a = MEM_alloc(SDRAM, SIZE, ALIGN); x = {…}; a = {…}; filter(…); MEM_free(SDRAM,a,SIZE); MEM_free(IRAM,x,SIZE); Using MEM functions #define SIZE 32 x=malloc(SIZE); a=malloc(SIZE); x={…}; a={…}; filter(…); free(a); free(x); Standard C syntax You can pick a specific heap Technical Training Organization T TO

32 BUF Concepts   Buffer pools contain a specified number of equal size buffers   Any number of pools can be created   Buffers are allocated from a pool and freed back when no longer needed   Buffers can be shared between applications   Buffer pool API are faster and smaller than malloc-type operations   In addition, BUF_alloc and BUF_free are deterministic (unlike malloc)   BUF API have no reentrancy or fragmentation issues POOL BUF SWI BUF_alloc BUF TSK BUF_free BUF BUF_createBUF_delete Technical Training Organization T TO

33 GCONF Creation of Buffer Pool Creating a BUF 1. right click on BUF mgr 2. select “insert BUF” 3. right click on new BUF 4. select “rename” 5. type BUF name 6. right click on new BUF 7. select “properties” 8. indicate desired Memory segment Number of buffers Size of buffers Alignment of buffers Gray boxes indicate effective pool and buffer sizes Technical Training Organization T TO

34 Using Memory Efficiently 1. 1. Keep it on-chip 2. 2. Use multiple sections 3. 3. Use local variables (stack) 4. 4. Using dynamic memory (heap, BUF) 5. 5. Overlay memory (load vs. run) 6. 6. Use cache Technical Training Organization T TO

35 Use Memory Overlays Internal SRAM CPU Program Cache Data Cache EMIF External Memory algo2 algo1 Using Memory Efficiently 5.Use Memory Overlays   Reuse the same memory locations for multiple algorithms (and/or data)   You must copy the sections yourself First, we need to make custom sections? Technical Training Organization T TO

36 Create Sections to Overlay #pragma CODE_SECTION(fir, “.FIR”); int fir(short *a, …) #pragma CODE_SECTION(iir, “myIIR”); int iir(short *a, …) myCode.C   How can we get them to run from the same location?   Where will they be originally loaded into memory?   The key is in the linker command file … Technical Training Organization T TO

37 Load vs. Run Addresses SECTIONS {.FIR:> IRAM /*load & run*/ myIIR: load=IRAM, run=IRAM Internal SRAM External Memory.fir myIIR   Simply directing a section into a MEM obj indicates it’s both the load & run from the same location.FIR:> IRAM   Alternatively, you could use:.FIR: load=IRAM, run=IRAM   In your own linker cmd file: load:where the fxn resides at reset run:tells linker its runtime location What if we wanted them be loaded to off-chip but run from on-chip memory?

38 Load vs. Run Addresses   Simply specify different addresses for load and run   You must make sure they get copied (using the memcopy or the DMA) load addresses run addresses   load: where the fxn resides at reset   run: tells linker its runtime location SECTIONS {.FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM Internal SRAM External Memory.FIR myIIR Back to our original problem, what if we want them to run from the same address?

39 Setting Load Addresses in GCONF

40 SECTIONS {.FIR: load=SDRAM,run=IRAM myIIR: load=SDRAM,run=IRAM Combining Run Addresses with UNION   Above, we only force different load/run   Below, we also force them to share (union) run locations load addresses run addresses SECTIONS { UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } Internal SRAM External Memory How can we make the overlay procedure easier?

41   First, create a section for each function   In your own linker cmd file:   load: where the fxn resides at reset   run: tells linker its runtime location   UNION forces both functions to be runtime linked to the same memory addresses (ie. overlayed)   You must move it with CPU or DMA Overlay Memory myLnk.CMD SECTIONS {.bss:> IRAM /*load & run*/ UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } SECTIONS {.bss:> IRAM /*load & run*/ UNION run = IRAM {.FIR : load = EPROM myIIR: load = EPROM } #pragma CODE_SECTION(fir, “.FIR”); int fir(short *a, …) #pragma CODE_SECTION(iir, “myIIR”); int iir(short *a, …) myCode.C

42 SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables

43 SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables typedef struct copy_record { unsigned int load_addr; unsigned int run_addr; unsigned int size; } COPY_RECORD; typedef struct copy_table { unsigned short rec_size; unsigned short num_recs; COPY_RECORD recs[2]; } COPY_TABLE; fir_copy_table3 1 fir load addr copy record fir run addr fir size iir_copy_table3 1 iir load addr copy record iir run addr iir size How do we use a Copy Table? Technical Training Organization T TO

44 SECTIONS { UNION run = IRAM {.FIR : load = EPROM, table(_fir_copy_table) myIIR: load = EPROM, table(_iir_copy_table) } Using Copy Tables #include extern far COPY_TABLE fir_copy_table; extern far COPY_TABLE iir_copy_table; extern void fir(void); extern void iir(void); main() { copy_in(&fir_copy_table); fir();... copy_in(&iir_copy_table); iir();... }   copy_in() provides a simple wrapper around mem_copy().   Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function.   copy_in() provides a simple wrapper around mem_copy().   Better yet, use the DMA hardware to copy the sections; specifically, the DAT_copy() function. What could be even easier than using Copy Tables?

45 Copy Table Header File /**************************************************************************/ /* cpy_tbl.h */ /* Specification of copy table data structures which can be automatically */ /* generated by the linker (using the table() operator in the LCF). */ /**************************************************************************/ /* Copy Record Data Structure */ /**************************************************************************/ typedef struct copy_record { unsigned int load_addr; unsigned int run_addr; unsigned int size; } COPY_RECORD; /**************************************************************************/ /* Copy Table Data Structure */ /**************************************************************************/ typedef struct copy_table { unsigned short rec_size; unsigned short num_recs; COPY_RECORD recs[1]; } COPY_TABLE; /**************************************************************************/ /* Prototype for general purpose copy routine. */ /*************************************************************************/ extern void copy_in(COPY_TABLE *tp); Technical Training Organization T TO

46 Copy Table (Group & Union) SECTIONS {... UNION { GROUP {.task1: { task1.obj(.text) }.task2: { task2.obj(.text) } } load = ROM, table(_task12_copy_table) GROUP {.task3: { task3.obj(.text) }.task4: { task4.obj(.text) } } load = ROM, table(_task34_copy_table) } run = RAM... } Load 1 2 3 4 Run 12 or 34 Technical Training Organization T TO

47 Use Memory Overlays Internal SRAM CPU Program Cache Data Cache EMIF External Memory algo2 algo1 Using Memory Efficiently 5.Use Memory Overlays   Reuse the same memory locations for multiple algorithms (and/or data)   Either CPU or DMA must copy the info from its original ( load ) location to the run location Is there an easier way to overlay code & data memory? Technical Training Organization T TO

48 Use Cache Internal Cache CPU Program Cache Data Cache EMIF External Memory.bss.text Using Memory Efficiently 6.Use Cache   Works for Code and Data   Keeps local (temporary) scratch copy of info on-chip   Commonly used, since once enabled it’s automatic   Discussed further in Chapter 14 Technical Training Organization T TO

49 Summary: Using Memory Efficiently   You may want to work through your memory allocations in the following order: 1. 1. Keep it all on-chip 2. 2. Use Cache(more in Ch 15) 3. 3. Use local variables (stack on-chip) 4. 4. Using dynamic memory (heap, BUF) 5. 5. Make your own sections(pragma’s) 6. 6. Overlay memory(load vs. run)   While this tradeoff is highly application dependent, this is a good place to start Technical Training Organization T TO

50 ti Technical Training Organization


Download ppt "TMS320C6000 DSP Optimization Workshop Chapter 10 Advanced Memory Management Copyright © 2005 Texas Instruments. All rights reserved. Technical Training."

Similar presentations


Ads by Google