How to realize high-performance compute with Multicore DSP

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction.
Advertisements

1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Flexible Airborne Architecture
Ethernet Over PCI Express Presented by Kallol Biswas
By Sunil G. Kulkarni, SO/F, Pelletron-Linac Facility, BARC-TIFR. 21/01/2011 ASET.
Flexible I/O in a Rigid World
Nios Multi Processor Ethernet Embedded Platform Final Presentation
2 nd ADVANCED COURSE ON DIAGNOSTICS AND DATA ACQUISITION Instrumentation Buses, Digital Communication and Protocols J. Sousa.
Chapter 8 Interfacing Processors and Peripherals.
1 Networks for Multi-core Chip A Controversial View Shekhar Borkar Intel Corp.
6-April 06 by Nathan Chien. PCI System Block Diagram.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 35 – Buses.
Bus structures Unit objectives:
Course ILT Bus structures Unit objectives Describe the primary types of buses Define interrupt, IRQ, I/O address, DMA, and base memory address Describe.
December 2003 DJM DECO_021 CPU Chips & Buses. December 2003 DJM DECO_022 CPU Chips Modern ones are contained on a single chip Each chip has a set of pins.
Fast A/D sampler FINAL presentation
Unit Subtitle: Bus Structures Excerpted from 1.
HARDWARE Rashedul Hasan..
Augmenting FPGAs with Embedded Networks-on-Chip
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
1  1998 Morgan Kaufmann Publishers Interfacing Processors and Peripherals.
CS61C L13 I/O © UC Regents 1 CS 161 Chapter 8 - I/O Lecture 17.
Press any key to continue by Marc Ruocco 1 High-Speed Interfaces: FPDP and RACEway RACE, RACEway and RACE++ are trademarks of Mercury Computer Systems,
Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Digital RF Stabilization System Based on MicroTCA Technology - Libera LLRF Robert Černe May 2010, RT10, Lisboa
Performance Characterization of the Tile Architecture Précis Presentation Dr. Matthew Clark, Dr. Eric Grobelny, Andrew White Honeywell Defense & Space,
Digital Signal Processing and Field Programmable Gate Arrays By: Peter Holko.
Introduction.
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
TM Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc.
© 2010 Altera Corporation—Public DSP Innovations in 28-nm FPGAs Danny Biran Senior VP of Marketing.
Getting Started With DSP A. What is DSP? B. Which TI DSP do I use? Highest performance C6000 Most power efficient C5000 Control optimized C2000 TMS320C6000™
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
PHY 201 (Blum) Buses Warning: some of the terminology is used inconsistently within the field.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Interconnection Structures
Peripheral Busses COMP Jamie Curtis. PC Busses ISA is the first generation bus 8 bit on IBM XT 16 bit on 286 or above (16MB/s) Extended through.
RADIO + MCU + FLASH + USB Low-Power RF System-on-Chip
HyperTransport™ Technology I/O Link Presentation by Mike Jonas.
1 Chapter Overview Understanding Expansion Buses Configuring Expansion Cards Cables and Connectors.
TI Accelerates Femtocell Deployments with DSP Solution Kathy Brown General Manager, Wireless Infrastructure Josef Alt Business Development Manager, Communication.
Bilal Saqib. Courtesy: Northrop Grumman Corporation.
The University of New Hampshire InterOperability Laboratory Introduction To PCIe Express © 2011 University of New Hampshire.
Buses Warning: some of the terminology is used inconsistently within the field.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
BUS IN MICROPROCESSOR. Topics to discuss Bus Interface ISA VESA local PCI Plug and Play.
CS-350 TERM PROJECT COMPUTER BUSES By : AJIT UMRANI.
Slide ‹Nr.› l © 2015 CommAgility & N.A.T. GmbH l All trademarks and logos are property of their respective holders CommAgility and N.A.T. CERN/HPC workshop.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Chapter 1 Introduction. Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2002 Chapter 1, Slide 2 Learning Objectives  Why process signals.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
AT91 Products Overview. 2 The Atmel AT91 Series of microcontrollers are based upon the powerful ARM7TDMI processor. Atmel has taken these cores, added.
System Bus.
TI Information – Selective Disclosure Implementation of Linear Algebra Libraries for Embedded Architectures Using BLIS September 28, 2015 Devangi Parikh.
TECHNICAL SEMINAR S V Suresh 08731A1254 By. 1 st GENERATION:  Introduced in 1980  Analog cellular mobile,Data speed 2.4kbps  1G mobiles- AMPS,NMT,TACS.
M. Bellato INFN Padova and U. Marconi INFN Bologna
WiMAX Wave 2 Software + Development Tools Will Spur Deployment
Hands On SoC FPGA Design
HyperTransport™ Technology I/O Link
Introduction.
Digital Signal Processor
Low Latency Analytics HPC Clusters
Chapter 1 Introduction.
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
I/O BUSES.
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
ADSP 21065L.
Presentation transcript:

How to realize high-performance compute with Multicore DSP

C667x Target Applications (Non- Telecom) Mission Critical Test and Automation HPC, Imaging and Medical Video Infrastructure Infrastructure Audio Emerging Others Emerging Broadband C6472 Target Application Areas Meeting the needs for today's leading-edge, high-performance applications, TI’s high power efficient C6472 was designed to support applications that drive many channels, applications that demand maximum performance density and breakthrough applications for which designers must have access to sophisticated functions. These devices are ideal for high-performance applications such as high-end industrial, mission critical, test and measurement, communication, high-end image and video, blade server and cloud computing. ATM/currency verification. Innovations 2

RF and Communication Applications Military & Defense Avionics Govt & Public Safety Application ISR (Intelligence/Surveillance/Reconnaissance) SIGINT/COMINT/Signal Generators Military Communications. SDR(JTRS)-Manpack/LMR/Fixed Comm. Infra - VoIP/Video Gateways Satellite\Avionics Communications Ground Receiver/Repeaters Weather Radar FAA – Civil Aviation/Govt Comm. Conventional PS – TETRA/APCO/E911 Wireless Infrastructure Emerging Broadband (OFDM/LTE/WiMAX) Utilities/Transport/Smart Grid Key Customer Careabouts Long Term Partnership Financial Stability Strong Roadmap and R&D Floating Point Performnce Size, Weight, and Power (SWaP) I/O Bandwidth Longevity of supply (10+yrs) 3 3

RF and Comm. Product Requirements End Product Need DSP Requirement Needs Raw Performance in terms of MIPS/GHz/MMACS Floating Point Capable ISA to achieve “precision” and high GFLOPS. Large On Chip RAM Reduce accesses to slow external memory. High Speed External Memory Interface Large addressable memory Efficient DMA architecture Wireless specific accelerators and TCP/IP Offload Support Multiple Waveforms Common Platform for TDMA/CDMA/OFDMA Multi-channel VoIP/Video capability Support FEC and Modulation TCP/IP Networking support Common DSP requirements Highest levels of raw performance – MIPS e.g. Image Processing/Analytics Integrated Fixed and Floating Point capability e.g. Radar/Sonar/Precision Guidance applications Large On Chip Memory, Large addressable external memory space, high BW EMIF e.g. Electro-Optical Imaging Apps Memory ECC – System Reliability Multiple high BW I/O for on-board & backplane connectivity, FPGA connectivity as well as transporting raw I/O data e.g. Phased Array RADAR input Scalable H/W and S/W solution e.g. COTS cards Efficiency: LOW mW/Perf e.g. UAV electronics, Avionics, Handheld SDR Ease of Use – S/W development tools + rich selection of easily available S/W IP + easily available experts

Imaging Product Requirements End Product Need DSP Requirement High BW Interface RF Front End and Telecom ports Connect Multiple DSPs on a board e.g. in ATCA Card High BW Backplane and Network Connectivity Needs multiple high speed interfaces PCIe ,Serial RapidIO OBSAI/CPRI Interface Gigabit Ethernet etc Memory Error Correction & Checking (ECC) Efficient Low Power DSPs Support Extended Temp ranges from -40oC to 105oC and others Temp Reliability in Mission Critical Designs Low Power Design Dev and Debug Tools Multicore S/W Frameworks Signal/Image Processing functions. VoIP Library Audio/Video Codecs Ease of Use

KEYSTONE Architecture Introducing “Keystone Architecture” (C66x) The Best Combination of Performance (GHz) and Power Consumption in the Industry 16GFLOPs & 32GMACS per Core @ 1GHz Next-Generation C66x DSP Core C64x+ Core (Fixed pt) C64x+ Fixed and Floating-point Core @ 1.25 GHz 4x C64x+ MAC (32) 4xC67x Fl pt MAC(8) 16FLOP/cy compared to 6FLOP/cy 8 Core C6678 based on C66x core delivers 320 GMACs/160GFLOPS @ 1.25GHz/Core (effectively a 10GHz DSP) 100% Code Compatible with all C64x (fixed) & C67x (floating) Devices Similar Power Profiles as C64x Core Supported by Code Composer Studio IDE Fixed Point Lowest Power Highest Performance DSP Core NEW MultiCore DSP C66x Floating Point C67x Core (Floating pt) C67xx Industry’s Lowest Power FP DSP Core High precision and wide dynamic range KEYSTONE Architecture

Unmatched Performance BDTImark2000 TM Score BDTI Score for Floating Point Processors BDTI Score for Fixed Point Processors Algorithm C67x @ 300MHz C64x+ @1.2GHz C66x @1.25GHz Gain Single Precision Floating Point FFT, 2048 pt, Radix 4 86.84 us 14.00 us* ~600% Fixed Point FFT, 2048 pt, Radix 4 8.23 us 4.46 us* ~200% FIR Filter, 40 samples, 40 taps 0.69 us 0.34 us* Matrix Multiply 32 x 32 17.92 us 6.16 us* ~300% Matrix Inverse 4 x 4 0.53 us 0.13 us* ~400% BDTI - Numbers based on 1.5GHz 6672 Platform – Dual Core C66x DSP running at 1.5GHz Data available on BDTI’s website Core to Core Performance Comparison (6678 has 8 C66x Cores) TI Internal Benchmarks. Full Utilization, Memory impacts not comprehended 7

TI Multicore KeyStone Architecture TeraNet 2 Shared Memory High Speed I/O Multicore Shared Memory Controller C66x, ARM Processing Cores Multicore Navigator Application Accelerator HyperLink 50 System Management (Debug, Clocking, Power) Network on Chip Highest Integration Cost & Power  Common Architecture Portable Software Scalable  Tailored Solutions Navigator Innovative Multi-core Floating Point Development Time  Tools & Debugging R&D Efficiency  Quality Software Solutions & Libraries The first network on chip infrastructure to unleash full multicore entitlement 8 8 8

Product Highlights: C6670 and C6678 Performance Optimized Core C6678 Power Optimized Core Next Generation C66x Core 4 C66x Cores @ 1GHz - 1.2GHz Memory Architecture 4MB Local L2/Core (1MB per Core) 2MB Multicore Shared Memory Communication Accelerators TCP3e (Turbo Encode) – Up to 550Mbps TCP3d (Turbo Decode) – Up to 600Mbps FFTC – 2048 FFT every 4.6µs VCP2 for voice channel decoding Next Generation C66x Core Up to 8 C66x Cores @ 1GHz -1.25GHz Available Options: 1, 2, 4, and 8 Core Devices Memory Architecture 4MB Local L2/Core (512KB per Core) 4MB Multicore Shared Memory Power Optimized Core <10W at 1Ghz nominal temp Multicore Navigator TeraNet C66X DSP L1 L2 SRIO x4 PCIe x2 AIF2 x6 I2C SPI UART Peripherals & IO SGMII 4x VCP2 3x TCP3d Communications CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Shared Memory 2MB DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink 2x RAC 1x TAC 3x FFTC BCP Crypto Packet Accelerator Network Multicore Navigator TeraNet C66X DSP L1 L2 8 x CorePac SRIO x4 PCIe x2 EMIF 16 TSIP I2C SPI UART Peripherals & IO GbE Switch SGMII IP Interfaces Crypto Packet Accelerator Network CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Shared Memory 4MB DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink The Four Core C6670 performance optimized device (due to accelerators), goes up to 1.2 GHz and enables 150GMAcs of fixed point theoretical performance. Compare this to the 1.2 GHz C6474 device @ 28.8 GMACs, and there is roughly a 5x theoretical performance improvement TI has managed to put together the C66x cores with a variety of peripherals, accelerators and on-chip infrastructure to enable a high performance SoC. Memory Architecture: Compared to the C6474, the C6670 now also supports 2 MB shared Memory, in addition to the 1 MB L2 dedicated memory. Also, there were significant enhancements made to the memory architecture that now enables very high speed memory access to both internal and external memory through a DDR3-1333 MHz. Also, the C6670 addressable memory space is 8 GB. Whether its on chip connectivity, managing traffic and flows through the device or having high speed communications, the C66x core provides improvements on all fronts. The 2 TB TeraNet switch fabric provides high bandwidth on chip communication. The Multicore navigator also helps streamline and manage efficient data transfer between the various on-chip components. When it comes to data going off-chip, multiple lanes of Serial RapidIO and PCI Express allow a very fat pipe to chip to chip or chip to backplane communication. 2 Gigabit Ethernet ports allow a mechanism to transfer data as well allow additional debug and boot mechanism. With six lanes each at 6 Gbps, this provides another large Bandwidth interface to the outside world. The Hyperlink50 is TI’s new approach to providing a very high speed, very low latency interface directly to the switch fabric of C66x core of devices. This “Serdes” based interface has ~50 Gbps Fast Data Bandwidth at full line rate and is the ideal connection to FPGAs and other Hyperlink 50 enabled devices. Acceleration – The C66x SOCs enable TCP/IP Packet (L1/L2/L3) Processing and offloads this from the DSP Cores. The Cryptographic Engine (available in the C6670 only – which is located in Network Co-processor block) supports AES/DES/3DES/ Snow/Kasumi To ease software development on such a Multicore, the C66x devices include hardware IP such as Multicore Navigator, Hardware Semaphores, and Embedded Debug capability such as Trace. TI Confidential – NDA Restrictions 9

Multicore Shared Memory Innovation & Integration via C6678 DSP Highlights C66x Core Next generation Fixed / Floating-Point DSP core with clock speeds ranging from 1GHz– 1.25GHz and Up to 8 core options Multicore Navigator Data transfer engine that is architected to move data between various system elements without using any CPU overhead so maximum system efficiency is achieved Multicore Navigator TeraNet C66X DSP L1 L2 8 x CorePac SRIO x4 PCIe x2 EMIF 16 TSIP I2C SPI UART Peripherals & IO GbE Switch SGMII IP Interfaces Crypto Packet Accelerator Network CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Shared Memory 4MB DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink Memory Architecture 0.5 MB of local Memory per core; 4 MB of Shared Memory. Enhanced memory architecture through an enhanced Multicore Shared memory Controller Bottleneck free fast on- and off-chip memory access including a DDR3-1333MHz (64-bit) interface L1/L2/L3 ECC Network Co- Processor and Accelerators A cost effective implementation to off-load the TCP/IP and secure networking functions from the DSP TeraNet Switch fabric that has 2 Terabits of bandwidth which allows maximum data transfer between system components to realize full system entitlement TI has managed to put together the C66x cores with a variety of peripherals, accelerators and on-chip infrastructure to enable a high performance SoC. Memory Architecture: There were significant enhancements made to the memory architecture that now enables very high speed memory access to both internal and external memory through a DDR3-1600 MHz with addressable memory space up to 8 GB. Whether its on chip connectivity, managing traffic and flows through the device or having high speed communications, the C66x core provides improvements on all fronts. The 2 TB TeraNet switch fabric provides high bandwidth on chip communication. The Multicore navigator also helps streamline and manage efficient data transfer between the various on-chip components. When it comes to data going off-chip, multiple lanes of Serial RapidIO and PCI Express allow a very fat pipe to chip to chip or chip to backplane communication. 2 Gigabit Ethernet ports allow a mechanism to transfer data as well allow additional debug and boot mechanism. With six lanes each at 6 Gbps, this provides another large Bandwidth interface to the outside world. The Hyperlink50 is TI’s new approach to providing a very high speed, very low latency interface directly to the switch fabric of C66x core of devices. This “Serdes” based interface has ~50 Gbps Fast Data Bandwidth at full line rate and is the ideal connection to FPGAs and other Hyperlink 50 enabled devices. Acceleration – The C66x SOCs enable TCP/IP Packet (L1/L2/L3) Processing and offloads this from the DSP Cores. The Cryptographic Engine (available in the C6670 only – which is located in Network Co-processor block) supports AES/DES/3DES/ Snow/Kasumi To ease software development on such a Multicore, the C66x devices include hardware IP such as Multicore Navigator, Hardware Semaphores, and Embedded Debug capability such as Trace. Improved Debug S/W Dev and Debug Support Leveraged by CCS Peripherals and I/O Interfaces High bandwidth peripherals that operate independently (NOT Shared) allowing simultaneous data transfer to prevent bottle necks - featuring: RapidIO v2.1 – 4lanes @ 5Gbps with 1x, 2x and 4x support PCIe x2 – 2lanes, running independently of RapidIO HyperLink Ultra high-speed ( up to 50 Gbaud), low latency serial interface that connects to other DSPs and FPGAs in the systems 10

Value Prop against FPGA Value Prop against other DSPs Competitive Analysis Value Prop against FPGA Value Prop against other DSPs C66x Performance 320GMACS/160GFLOP Baseband on a chip. Handles multiple waveforms supporting OFDM,CDMA,TDM L1/L2/L3 Processing capability Wireless Accelerators (VCP/TCP/FFT) Software Programmability Time To Market Smaller Package (more DSP/Board) Lower Power smaller battery, simpler cooling Low Cost - MIPs/$ C66x Fixed & Floating Point capability@1.25GHz Industry’s Fastest DSP at 10GHz On-Chip RAM up to 8MB DDR3 1600MHz, 64Bit, 8GB Address space Multiple Independent High Speed IO 4xsRIOv2.1,2xPCIe Gen II, 2xSGMII, 2xTSIP High BW FPGA connectivity Hyperlink @ 50Gbps 1/2/4/8 Core Option (Pin Compatible) L1/L2/L3 Memory ECC – System Reliability Low Power per GFLOPs and GMACS Extended Temp support -40oC to 105oC CCS Tools + S/W Collateral 3rd Party Network

TMDXEVM6678L EVM Singe wide AMC form factor Code Composer Studio™ IDE *Design *Code and Build *Debug *Analyze *Tune H/W Development Tools CCSv5 Allows designers of all experience levels to move quickly through application development (www.ti.com/ccstudio) Time Limited FREE Evaluation Versions available for download. Includes C667x Simulator EVM Kit includes BIOS 6.x, BIOS-MCSDK / LINUX-MCSDK 2.0 (NDK, PDK, LIB etc), Sample Program and Out of box demo (OOB) e.g. I/O Benchmark, Imaging Processing Pipeline and High Performance DSP Utility Application (HUA) User Guide, Starter guide, Tech Ref Guide, App Notes etc TMDXEVM6678L – EVM with XDS100 emulation - $399 TMDXEVM6678LE – EVM with XDS560V2 emulation - $599 TMDXEVM6678LXE – EVM with XDS560V2 emulation –Encryption Enabled - $599 TMDSEMU560v2STM-UE - XDS560v2 System Trace Emulator with 128Mb System Trace buffer and Ethernet / USB support Optional PCIe adapter card to connect the C6678 EVM to a standard PCI header of a desktop. Low cost EVM starting at $399 (differs in the emulation technology used) Standard AMC form factor card (MicroTCA chassis) but can be used in a standalone manner The board has a Xilinx FPGA All the interfaces have been brought out either through individual connectors or the backplane connector. A separate PCIe adapter card available which will allow connecting this board to the PC The EVM comes with Multicore Software Development Kit for quick startup.

TI’s Multicore Hardware Ecosystem Others Standardized Boards Chassis / System PCIExpress (with Gen 2) Advanced Mezzanine (AMC) Custom ATCA Other

TI’s Multicore Software Ecosystem Customer Application Layer 2+ Multicore Entitlement IP Network Stack Layer 1 UMTS Layer 1 LTE TI Runtime TI’s Device Entitlement Libraries TI Layer 1 Libraries TI BIOS, Linux, OSE(ck)

Multicore Tools and Software (MC-SDK) Codegen with OpenMP support Emulator/Debugger Simulator Profiler / DVT 3rd party tools Software BIOS/Linux SDK Multicore Demonstration 6.x DSP BIOS Platform Abstraction Basic Networking Inter core communication Application Specific Libraries Audio/Video CODECS VoIP Components WiMAX Toolkit, LTE Toolkit, DSPLib others.. Eclipse DSP Customer Application Code Composer StudioTM Third Party Plug-Ins Multicore Software Development Kit Demo App Multicore BIOS Demo App Multicore Linux Editor/IDE Polycore Demo App Multicore BIOS and Linux ENEA Optima Compiler Linker (Codegen) 3L DSPLIB IMGLIB Speech Codec NDK Audio Codec Video Codec Profiler Operating System w/ Boot Loader BIOS Linux Debugger Platform Development Kit Multicore Entitlement Remote Debug Inter Core Communication SoC Analyzer Full Silicon Entitlement Host Computer Target Board XDS 560 V2 XDS 560 Trace

KeyStone Multicore Software – Libraries & Codecs Digital Signal Processing FFT Adaptive Filtering Filtering and convolution Others….. Available free from TI Image Processing Edge Detection Boundary Morphology Others….. Available free from TI Voice and Fax Line Echo Cancellation Voice Activity Detection Others… Available free from TI Libraries Vision Lib (object only) 50+ royalty-free kernels: • Background modeling & subtraction • Object feature extraction • Tracking, recognition • Low-level pixel processing MATLAB Image processing Math operations Vision Analytics Security/Cryptography AES, SHA1, 3DES Voice G.711, G.722 G.723, G.729 CDMA, AMR(NB/WB), EVRC-B Others Video H.263 H.264 MPEG2 MPEG4 VC1/WMV9 Decode Others Audio MPEG1 Layer2 AAC LC/HE AC3 2.0/5.1 Sample Rate Conversion Codecs Fax T.38 Fax Modem

High-Performance and Multicore Processor High Value Keystone Architecture High-Performance at the Right Power & Price Low-Cost EVM Open & Affordable Tools Easy to Use Training Product Collateral Drivers & Example Code User Community Quick to Market Delivering affordable, out of the box experience with SW enablers for fast product development Enabler Software Quick-Start Hardware Benchmarks & Functional Understanding Frameworks & Abstraction Generic Libraries Application Libraries 17

Getting Started – More Information/Links Product Folders: C66X Informational Wiki Page All C6000 Multicore DSPs TMS320C6670 TMS320C6678 EVMs and Software Tools: TMS320C6678 EVM TMS320C6670 EVM AMC to PCIe Adapter Card Multicore Software Development Kit for BIOS & Linux MCSDK Wiki CCS v5 Wiki C66x Linux Wiki DSP Signal Processing Library(DSPLIB) Image and Video Processing Library (IMGLIB) LTE /WiMAX Toolkit – Discuss with BDM Technical Support TI E2E Community (Online Support) Product Training This slides gives you the links and a brief description of all the EVMs TI provides through TI.com for the different High Perf. DSPs. The most notable thing to notice is how TI is aggressively trying to make really low-cost EVMs (see $350 6472 EVM) available to our customers. There are many TI 3rd Parties developing different types of hardware platforms based on TI DSPs as well. You can find the names of some of those 3rd Parties by visiting TI’s website at http://focus.ti.com/dsp/docs/thirdparty/catalog/searchcatalog.tsp TI Confidential – NDA Restrictions

Online Video Training http://focus. ti

Mission Critical DSP Market “What Customers Like about TI” 2002 2009 Revenue Undisputed #1 DSP and SoC supplier Strong Growth for 8 years in a row, even in 2009 Higher R&D spending than DSP revenue of most competitors KeyStone SoC Architecture secures future success Rich Product Portfolio & Strong Roadmap 2 Families with multiple devices and growing Nyquist(6670), Shannon(6678/4/2) 40nm -> 28nm Tools/Software & Compilers 3rd Party Eco-System Multiple Design Wins Pre-Announcement Secure Supply – No DSP product discontinuation (end of life) History of delivery upon promises (Power, GHz, ..) Field Experience - Completeness of system analysis, Architecture, Internal Switch, …. Customer Support Business Model - Long Term relationships with key customers – Actively seek and incorporate customer feedback in roadmap devices. TI SoC Architecture Layer 1 Layer 2 Layer 3+ PHY MAC Layer 3, 4 Radio IP Network Macro Pico Femto Software

Backup Slides Product Details

C6678 (Shannon) “Lightning” Half-Length PCIe Card Feature Set TI TMS320C6678 (8-core) x 4 C66x Core Frequency: 1.25GHz DDR3 Memory Data Frequency: 1600MHz Data Bus Width: 64-bit Serial RapidIO Gen-2 Interface PCIe Gen-2 Interface 10/100/1000Mbps Ethernet w/ SGMII Hyperlink50 Interface 1024 MB DDR3-1333 on board PLX PEX8624 PCIe Gen-2 Switch Serial RapidIO daisy-chain Ethernet daisy-chain Each DSP device is linked to PCIe switch by x2 lanes Dual DSPs linked by Hyperlink50 Power: Max 54Watts Now we are going to talk about the Mirage family. The 1st member of this family is “Mirage I”. It has 2 Shannon devices and 1 P2010 power PC. As well as the MMC controller.

What is Hyperlink? “high-speed, low-latency, and low-pin-count communication interface” Low pin count (24 pins) Point to Point Connection Interconnect DSP-to-DSP DSP-to-FPGA. SerDes for data transfer x1 x4 modes for Tx and Rx 12.5GBaud/lane Effectively 8b9b encoding LVCMOS sideband signals for flow control & power mgmt - errors/events/timeouts * Simple packet-based transfer protocol for memory-mapped access * Read/Write to DSP/FPGA local memory - discrete memory access of any byte aligned width up to 64bits. - burst transfer modes Write (Maximum Burst Size 256Bytes) Write Request ---> Data Packet ---> Read (Maximum Burst Size 256Bytes) Read Request ---> Read Response - Interrupt Request <--> Up to 64 Memory mapped Regions each region up to 256MB

Universal Parallel Port (uPP) What is it? Parallel bus, two independent channels (separate data buses) I/O speeds up to 75 MHz with 8-16 bit data width per channel 1 or 2 channel parallel interface operating in RX, TX or FD mode Supports Double data rate mode of operation (Bandwidth does not change/increase) Application Each channel can interface cleanly with high-speed ADCs and/or DACs with up to 16-bit data width (per channel). Useful as low cost interface with FPGAs. Can run up to 120MByte/s per channel in single channel or bi-directional mode ( 240MByte for both channels in unidirectional mode) Can also be used to interface two C6655/57 devices or to connect C6655/57 with C674x or OMAP-L13x family of devices. Other benefits Internal DMA – leaves CPU EDMA free Simple protocol with few control pins (configurable: 2-4 per channel) Multiple data packing formats for 9-15 bit data widths Interleave mode (single channel only) Simple interface: IO Queued by software Throughput Estimates: Note: Max. clock of 50 MHz in (*) configuration

Thank You