Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Very Large Fast DFT (VL FFT) Implementation on KeyStone Multicore Applications.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
KeyStone C66x CorePac Overview
Extended Memory Controller and the MPAX registers And Cache
KeyStone Advance Debug
Computer Abstractions and Technology
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Intel® performance analyze tools Nikita Panov Idrisov Renat.
ARM-DSP Multicore Considerations CT Scan Example.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
KeyStone Training Multicore Navigator Overview. Overview Agenda What is Navigator? – Definition – Architecture – Queue Manager Sub-System (QMSS) – Packet.
Configurable System-on-Chip: Xilinx EDK
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Chapter 13 Embedded Systems
Midterm Tuesday October 23 Covers Chapters 3 through 6 - Buses, Clocks, Timing, Edge Triggering, Level Triggering - Cache Memory Systems - Internal Memory.
Figure 1.1 Interaction between applications and the operating system.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
CS533 - Concepts of Operating Systems
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
Multicore Design Considerations. Multicore: The Forefront of Computing Technology “We’re not going to have faster processors. Instead, making software.
LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.
Getting Started With DSP A. What is DSP? B. Which TI DSP do I use? Highest performance C6000 Most power efficient C5000 Control optimized C2000 TMS320C6000™
Multicore Software Development Kit (MCSDK) Training Introduction to the MCSDK.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Multicore Software Development Kit (MCSDK) Training Introduction to the MCSDK.
Ch. 9 Interrupt Programming and Real-Time Sysstems From Valvano’s Introduction to Embedded Systems.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Extended Memory Controller and the MPAX registers
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
Software Performance Analysis Using CodeAnalyst for Windows Sherry Hurwitz SW Applications Manager SRD Advanced Micro Devices Lei.
A DSP-Based Platform for Wireless Video Compression Patrick Murphy, Vinay Bharadwaj, Erik Welsh & J. Patrick Frantz Rice University November 18, 2002.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Why do you need to study computer organization and architecture? Computer science and IT.
TMS320 DSP Algorithm Standard: Overview & Rationalization.
Challenges in KeyStone Workshop Getting Ready for Hawking, Moonshot and Edison.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Support Across The Board ™ Visual DSP Kernel (VDK)
Computer Architecture Lecture 32 Fasih ur Rehman.
TI Information – Selective Disclosure Implementation of Linear Algebra Libraries for Embedded Architectures Using BLIS September 28, 2015 Devangi Parikh.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
بسم الله الرحمن الرحيم MEMORY AND I/O.
DSP/BIOS Real Time Operating system using DSP /ARM processor.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
EE 345S Real-Time Digital Signal Processing Lab Fall 2008 Lab #3 Generating a Sine Wave Using the Hardware & Software Tools for the TI TMS320C6713 DSP.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Architectures of Digital Information Systems Part 1: Interrupts and DMA dr.ir. A.C. Verschueren Eindhoven University of Technology Section of Digital.
ARM Embedded Systems
Advanced Operating Systems CIS 720
UNIT – Microcontroller.
Texas Instruments TDA2x and Vision SDK
EE 445S Real-Time Digital Signal Processing Lab Spring 2017
Subject Name: Digital Signal Processing Algorithms & Architecture
Multicultural Social Community Development Institute ( MSCDI)
CSCI1600: Embedded and Real Time Software
EE 4xx: Computer Architecture and Performance Programming
CSCI1600: Embedded and Real Time Software
Operating System Introduction.
Embedded Development Tools
Presentation transcript:

Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project

1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 2 / 26

* Learn to use the new TI C66 platform and to exploit its abilities and advantages. * Implement a Real-Time computer vision algorithm using multi-core programming. 3 / 26

1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 4 / 26

* Hardware: TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor * Software: Code Composer Studio v5 with BIOS MCSDK / 26

* 8 C66x CorePac DSP’s * Based on TI’s Keystone Multicore Architecture * 320 GMAC/ GHz * 32KB L1P, 32KB L1D, 512KB L2 Per Core * 4MB Shared L2 * 64-Bit DDR3 Interface (DDR3-1600) 6 / 26

1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 7 / 26

1. CCS Simulator and Profiler 2. Cache configuration 3. DMA data transfer 4. Interrupts 5. Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) 6. SYS/BIOS 7. Multi-core programming 8 / 26

* The CCS V5 can simulate the C6678 processor and some peripherals. * The profiler analyzes execution time and statistics for functions and code lines. 9 / 26

* Graph viewer – enables to view data from memory in time or frequency domain. * Image Analyzer – enables to view an image stored in memory or file. Supports grayscale, RGB and YUV color formats. 10 / 26

* 32 KB L1P cache. L1P is read-allocate and direct mapped. * 32 KB L1D cache. L1D is read-allocate, write- back and 2-way set associative. * Each can be configured as 0, 4, 8, 16 or 32 KB cache. * 512KB L2 cache. L2 is read and write allocate and 4-way set associative. * L2 can be configured as 0, 32, 64, 128, 256 or 512 KB cache. * All configurations can be done during run time. 11 / 26

Achievements: * Configuring different L1 and L2 cache sizes during or before run time. * Using L1 and L2 as SRAM memory (fully SRAM or part SRAM and part cache). * Controlling variable locations (L1,L2 or DDR3 memories). 12 / 26

* C66xx Processors has 3 EDMA3 controllers, each with 64 DMA channels + 8 QDMA channels. * EDMA3 supports data transfer to\from cache, shared memory or external memory. * EDMA3 supports the use of hardware interrupts. * In addition, each core has a faster IDMA controller for internal transfers. 13 / 26

14 / 26 Achievements: * Using IDMA to t ransfer data inside a core (L2 ↔L1). * Using EDMA3 to transfer data to\from L1, L2 and DDR3.

The interrupt controller supports up to 128 system events. They consist of both internally-generated events (within the C66x CorePac) and chip-level events. 15 / 26

The interrupt controller outputs 15 signals to the core from the event inputs: * One maskable hardware exception * 12 maskable hardware interrupts * One non-maskable signal * One reset signal 16 / 26

17 / 26 Achievements: * Configuring manually triggered events. * Configuring EDMA transfer completion routine using EDMA system event.

* DSPLib – an optimized DSP function library that includes general-purpose signal-processing routines for real-time applications. 18 / 26 LPF

* IMGLib – an optimized image/video processing function library that includes general-purpose image/video processing routines for real-time applications. 19 / 26 Histogram Edge Detection Derivative

Some more libraries * VLib – a collection of computer vision algorithms that are optimized for TI DSPs. * IQMath – a collection of highly optimized fixed point arithmetic, trigonometric and mathematical functions. typically used in real- time applications. * fastMath – optimized arithmetic and trigonometric functions for floating point devices. 20 / 26

21 / 26 Achievements: * Using DSPLib for a simple signal-processing application with floating point arrays. * Using IMGLib for a simple image-processing application. Still left: * Studying VLib, IQMath and fast Math Libraries. * Compare actual running time to the running time specified in the User Guide.

* SYS/BIOS is a real time operating system designed to be used by applications that require real-time scheduling and synchronization. * SYS/BIOS provides preemptive multi-threading, hardware abstraction, real-time analysis, and configuration tools. * SYS/BIOS is designed to minimize memory and CPU requirements on the target. 22 / 26

23 / 26 Achievements: * Using SYS/BIOS modules to configure DSP’s memory (cache sizes, memory sections, heap and stack size). * Running a multi-threaded program with shared variables protection. Still left: * Using SYS/BIOS modules to configure DSP peripherals (LAN, SRIO, PCIe).

1. CCS Simulator and Profiler - done 2. Cache configuration - done 3. DMA data transfer - done 4. Interrupts - done 5. Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) – In Progress 6. SYS/BIOS – In Progress 7. Multi-core programming 24 / 26

1. Project Goals 2. Development Tools 3. Learning Steps 4. What’s next 25 / 26

1. Implementation of a bidirectional data flow between DDRIII and L1, possibly through L2. (3 weeks) 2. Performance analysis (throughput, latency and accuracy) when using floating point versus fixed point libraries. (2 weeks) 3. Usage of hardware semaphores for parallel data access and Multicore Navigator for enabling messages communication between different cores. (4 weeks) 26 / 26