1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 7 / 26
1. CCS Simulator and Profiler 2. Cache configuration 3. DMA data transfer 4. Interrupts 5. Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) 6. SYS/BIOS 7. Multi-core programming 8 / 26
* The CCS V5 can simulate the C6678 processor and some peripherals. * The profiler analyzes execution time and statistics for functions and code lines. 9 / 26
* Graph viewer – enables to view data from memory in time or frequency domain. * Image Analyzer – enables to view an image stored in memory or file. Supports grayscale, RGB and YUV color formats. 10 / 26
* 32 KB L1P cache. L1P is read-allocate and direct mapped. * 32 KB L1D cache. L1D is read-allocate, write- back and 2-way set associative. * Each can be configured as 0, 4, 8, 16 or 32 KB cache. * 512KB L2 cache. L2 is read and write allocate and 4-way set associative. * L2 can be configured as 0, 32, 64, 128, 256 or 512 KB cache. * All configurations can be done during run time. 11 / 26
Achievements: * Configuring different L1 and L2 cache sizes during or before run time. * Using L1 and L2 as SRAM memory (fully SRAM or part SRAM and part cache). * Controlling variable locations (L1,L2 or DDR3 memories). 12 / 26
* C66xx Processors has 3 EDMA3 controllers, each with 64 DMA channels + 8 QDMA channels. * EDMA3 supports data transfer to\from cache, shared memory or external memory. * EDMA3 supports the use of hardware interrupts. * In addition, each core has a faster IDMA controller for internal transfers. 13 / 26
14 / 26 Achievements: * Using IDMA to t ransfer data inside a core (L2 ↔L1). * Using EDMA3 to transfer data to\from L1, L2 and DDR3.
The interrupt controller supports up to 128 system events. They consist of both internally-generated events (within the C66x CorePac) and chip-level events. 15 / 26
The interrupt controller outputs 15 signals to the core from the event inputs: * One maskable hardware exception * 12 maskable hardware interrupts * One non-maskable signal * One reset signal 16 / 26
17 / 26 Achievements: * Configuring manually triggered events. * Configuring EDMA transfer completion routine using EDMA system event.
* DSPLib – an optimized DSP function library that includes general-purpose signal-processing routines for real-time applications. 18 / 26 LPF
* IMGLib – an optimized image/video processing function library that includes general-purpose image/video processing routines for real-time applications. 19 / 26 Histogram Edge Detection Derivative
Some more libraries * VLib – a collection of computer vision algorithms that are optimized for TI DSPs. * IQMath – a collection of highly optimized fixed point arithmetic, trigonometric and mathematical functions. typically used in real- time applications. * fastMath – optimized arithmetic and trigonometric functions for floating point devices. 20 / 26
21 / 26 Achievements: * Using DSPLib for a simple signal-processing application with floating point arrays. * Using IMGLib for a simple image-processing application. Still left: * Studying VLib, IQMath and fast Math Libraries. * Compare actual running time to the running time specified in the User Guide.
* SYS/BIOS is a real time operating system designed to be used by applications that require real-time scheduling and synchronization. * SYS/BIOS provides preemptive multi-threading, hardware abstraction, real-time analysis, and configuration tools. * SYS/BIOS is designed to minimize memory and CPU requirements on the target. 22 / 26
23 / 26 Achievements: * Using SYS/BIOS modules to configure DSP’s memory (cache sizes, memory sections, heap and stack size). * Running a multi-threaded program with shared variables protection. Still left: * Using SYS/BIOS modules to configure DSP peripherals (LAN, SRIO, PCIe).
1. CCS Simulator and Profiler - done 2. Cache configuration - done 3. DMA data transfer - done 4. Interrupts - done 5. Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) – In Progress 6. SYS/BIOS – In Progress 7. Multi-core programming 24 / 26
1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 25 / 26
1. Implementation of a bidirectional data flow between DDRIII and L1, possibly through L2. (3 weeks) 2. Performance analysis (throughput, latency and accuracy) when using floating point versus fixed point libraries. (2 weeks) 3. Usage of hardware semaphores for parallel data access and Multicore Navigator for enabling messages communication between different cores. (4 weeks) 26 / 26