Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 How to realize high-performance compute with Multicore DSP.

Similar presentations

Presentation on theme: "1 How to realize high-performance compute with Multicore DSP."— Presentation transcript:

1 1 How to realize high-performance compute with Multicore DSP

2 TI Confidential – NDA Restrictions C667x Target Applications (Non- Telecom) Emerging Others Test and AutomationMission Critical Infrastructure Audio HPC, Imaging and Medical Video Infrastructure Emerging Broadband Innovations

3 TI Confidential – NDA Restrictions 3 3 RF and Communication Applications Key Customer Careabouts Long Term Partnership Financial Stability Strong Roadmap and R&D Floating Point Performnce Size, Weight, and Power (SWaP) I/O Bandwidth Longevity of supply (10+yrs) Application ISR (Intelligence/Surveillance/Reconnaissance) oSIGINT/COMINT/Signal Generators Military Communications. oSDR(JTRS)-Manpack/LMR/Fixed oComm. Infra - VoIP/Video Gateways Satellite\Avionics Communications oGround Receiver/Repeaters oWeather Radar FAA – Civil Aviation/Govt Comm. Conventional PS – TETRA/APCO/E911 oWireless Infrastructure oComm. Infra - VoIP/Video Gateways Emerging Broadband (OFDM/LTE/WiMAX) oUtilities/Transport/Smart Grid Govt & Public Safety Avionics Military & Defense

4 TI Confidential – NDA Restrictions 4 RF and Comm. Product Requirements Needs Raw Performance in terms of MIPS/GHz/MMACS Floating Point Capable ISA to achieve precision and high GFLOPS. Large On Chip RAM –Reduce accesses to slow external memory. High Speed External Memory Interface Large addressable memory Efficient DMA architecture Wireless specific accelerators and TCP/IP Offload Support Multiple Waveforms Common Platform for TDMA/CDMA/OFDMA Multi-channel VoIP/Video capability Support FEC and Modulation TCP/IP Networking support End Product Need DSP Requirement

5 TI Confidential – NDA Restrictions 5 Reliability in Mission Critical Designs Low Power Design High BW Interface RF Front End and Telecom ports Connect Multiple DSPs on a board e.g. in ATCA Card High BW Backplane and Network Connectivity Needs multiple high speed interfaces –PCIe,Serial RapidIO –OBSAI/CPRI Interface –Gigabit Ethernet etc Memory Error Correction & Checking (ECC) Efficient Low Power DSPs Support Extended Temp ranges from -40 o C to 105 o C and others Temp Ease of Use Imaging Product Requirements Dev and Debug Tools Multicore S/W Frameworks Signal/Image Processing functions. VoIP Library Audio/Video Codecs End Product Need DSP Requirement

6 TI Confidential – NDA Restrictions 6 Introducing Keystone Architecture (C66x) The Best Combination of Performance (GHz) and Power Consumption in the Industry 16GFLOPs & 32GMACS per 1GHz Fixed and Floating-point 1.25 GHz 4x C64x+ MAC (32) 4xC67x Fl pt MAC(8) 16FLOP/cy compared to 6FLOP/cy 8 Core C6678 based on C66x core delivers GHz/Core (effectively a 10GHz DSP) 100% Code Compatible with all C64x (fixed) & C67x (floating) Devices Similar Power Profiles as C64x Core Supported by Code Composer Studio IDE Next-Generation C66x DSP Core Floating Point Fixed Point C64x+ Core (Fixed pt) C64x+ Lowest Power Highest Performance DSP Core C67x Core (Floating pt) Industrys Lowest Power FP DSP Core High precision and wide dynamic range C67xx NEW MultiCore DSP C66x KEYSTONE Architecture

7 TI Confidential – NDA Restrictions Unmatched Performance BDTI Score for Floating Point Processors BDTImark2000 TM Score BDTI Score for Fixed Point Processors Algorithm 300MHz z z Gain Single Precision Floating Point FFT, 2048 pt, Radix us14.00 us*~600 % Fixed Point FFT, 2048 pt, Radix us4.46 us*~200 % FIR Filter, 40 samples, 40 taps0.69 us0.34 us*~200 % Matrix Multiply 32 x us6.16 us*~300 % Matrix Inverse 4 x us0.13 us*~400 %

8 TI Confidential – NDA Restrictions 8 8 The first network on chip infrastructure to unleash full multicore entitlement TeraNet 2 Shared Memory High Speed I/O Multicore Shared Memory Controller C66x, ARM Processing Cores C66x, ARM Processing Cores Multicore Navigator Application Accelerator HyperLink 50 System Management (Debug, Clocking, Power) Network on Chip TI Multicore KeyStone Architecture Highest Integration –Cost & Power Common Architecture –Portable Software Scalable – Tailored Solutions Navigator –Innovative Multi-core Floating Point –Development Time Tools & Debugging –R&D Efficiency Quality Software –Solutions & Libraries

9 9 TI Confidential – NDA Restrictions Product Highlights: C6670 and C6678 TI Confidential – NDA Restrictions Next Generation C66x Core - Up to 8 C66x 1GHz -1.25GHz - Available Options: 1, 2, 4, and 8 Core Devices Memory Architecture - 4MB Local L2/Core (512KB per Core) - 4MB Multicore Shared Memory Power Optimized Core - <10W at 1Ghz nominal tempC6678 Power Optimized CoreC6670 Performance Optimized Core Next Generation C66x Core - 4 C66x 1GHz - 1.2GHz Memory Architecture - 4MB Local L2/Core (1MB per Core) - 2MB Multicore Shared Memory Communication Accelerators - TCP3e (Turbo Encode) – Up to 550Mbps - TCP3d (Turbo Decode) – Up to 600Mbps - FFTC – 2048 FFT every 4.6µs - VCP2 for voice channel decoding Multicore Navigator TeraNet C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 8 x CorePac SRIO x4 SRIO x4 PCIe x2 PCIe x2 EMIF 16 EMIF 16 TSIP x2 TSIP x2 I 2 C SPI I 2 C SPI UART Peripherals & IO GbE Switch GbE Switch SGMII IP Interfaces Crypto Packet Accelerator Packet Accelerator Network CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Multicore Shared Memory Controller (MSMC) Shared Memory 4MB DDR3- 64b DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink Multicore Navigator TeraNet C66X DSP C66X DSP L1 L2 SRIO x4 SRIO x4 PCIe x2 PCIe x2 AIF2 x6 AIF2 x6 I 2 C SPI I 2 C SPI UART Peripherals & IO SGMII x2 SGMII x2 4x VCP2 3x TCP3d Communications CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Multicore Shared Memory Controller (MSMC) Shared Memory 2MB DDR3- 64b DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink C66X DSP C66X DSP L1 L2 2x RAC 1x TAC 3x FFTC BCP Crypto Packet Accelerator Packet Accelerator Network CoProcessors C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2

10 10 TI Confidential – NDA Restrictions Memory Architecture 0.5 MB of local Memory per core; 4 MB of Shared Memory. Enhanced memory architecture through an enhanced Multicore Shared memory Controller Bottleneck free fast on- and off- chip memory access including a DDR3-1333MHz (64-bit) interface L1/L2/L3 ECC Multicore Navigator TeraNet C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 C66X DSP C66X DSP L1 L2 8 x CorePac SRIO x4 SRIO x4 PCIe x2 PCIe x2 EMIF 16 EMIF 16 TSIP x2 TSIP x2 I 2 C SPI I 2 C SPI UART Peripherals & IO GbE Switch GbE Switch SGMII IP Interfaces Crypto Packet Accelerator Packet Accelerator Network CoProcessors Power Management Debug Multicore Shared Memory Controller (MSMC) Multicore Shared Memory Controller (MSMC) Shared Memory 4MB DDR3- 64b DDR3- 64b EDMA SysMon System Elements Memory Subsystem HyperLink Innovation & Integration via C6678 DSP Highlights Peripherals and I/O Interfaces High bandwidth peripherals that operate independently ( NOT Shared) allowing simultaneous data transfer to prevent bottle necks - featuring: RapidIO v2.1 – 5Gbps with 1x, 2x and 4x support PCIe x2 – 2lanes, running independently of RapidIO Improved Debug S/W Dev and Debug Support Leveraged by CCS C66x Core Next generation Fixed / Floating-Point DSP core with clock speeds ranging from 1GHz– 1.25GHz and Up to 8 core options Network Co- Processor and Accelerators A cost effective implementation to off-load the TCP/IP and secure networking functions from the DSP Multicore Navigator Data transfer engine that is architected to move data between various system elements without using any CPU overhead so maximum system efficiency is achieved TeraNet Switch fabric that has 2 Terabits of bandwidth which allows maximum data transfer between system components to realize full system entitlement HyperLink Ultra high-speed ( up to 50 Gbaud), low latency serial interface that connects to other DSPs and FPGAs in the systems

11 11 Competitive Analysis Value Prop against FPGAValue Prop against other DSPs C66x Performance –320GMACS/160GFLOP –Baseband on a chip. Handles multiple waveforms supporting OFDM,CDMA,TDM –L1/L2/L3 Processing capability –Wireless Accelerators (VCP/TCP/FFT) Software Programmability –Time To Market Smaller Package (more DSP/Board) Lower Power –smaller battery, simpler cooling Low Cost - MIPs/$ C66x Fixed & Floating Point –Industrys Fastest DSP at 10GHz On-Chip RAM up to 8MB DDR3 –1600MHz, 64Bit, 8GB Address space Multiple Independent High Speed IO –4xsRIOv2.1,2xPCIe Gen II, 2xSGMII, 2xTSIP High BW FPGA connectivity 50Gbps 1/2/4/8 Core Option (Pin Compatible) L1/L2/L3 Memory ECC – System Reliability Low Power per GFLOPs and GMACS Extended Temp support -40 o C to 105 o C CCS Tools + S/W Collateral 3 rd Party Network

12 TMDXEVM6678L EVM Singe wide AMC form factor Code Composer Studio IDE *Design *Code and Build *Debug *Analyze *Tune CCSv5 Allows designers of all experience levels to move quickly through application development ( Time Limited FREE Evaluation Versions available for download. Includes C667x Simulator EVM Kit includes BIOS 6.x, BIOS-MCSDK / LINUX-MCSDK 2.0 (NDK, PDK, LIB etc), Sample Program and Out of box demo (OOB) e.g. I/O Benchmark, Imaging Processing Pipeline and High Performance DSP Utility Application (HUA) User Guide, Starter guide, Tech Ref Guide, App Notes etc H/W Development Tools TMDXEVM6678L – EVM with XDS100 emulation - $399 TMDXEVM6678LE – EVM with XDS560V2 emulation - $599 TMDXEVM6678LXE – EVM with XDS560V2 emulation –Encryption Enabled - $599 TMDSEMU560v2STM-UE - XDS560v2 System Trace Emulator with 128Mb System Trace buffer and Ethernet / USB support Optional PCIe adapter card to connect the C6678 EVM to a standard PCI header of a desktop. C6678

13 TIs Multicore Hardware Ecosystem Custom Chassis / System Others PCIExpress (with Gen 2) Advanced Mezzanine (AMC) ATCA Standardized Boards Other

14 TIs Multicore Software Ecosystem Layer 1 UMTS Layer 1 LTE Layer 2+ Customer Application TI Layer 1 Libraries TI BIOS, Linux, OSE(ck) Multicore Entitlement TIs Device Entitlement Libraries IP Network Stack TI Runtime

15 TI Confidential – NDA Restrictions 15 DSP Multicore Tools and Software (MC-SDK) Tools –Codegen with OpenMP support –Emulator/Debugger –Simulator –Profiler / DVT –3 rd party tools Software –BIOS/Linux SDK Multicore Demonstration 6.x DSP BIOS –Platform Abstraction –Basic Networking –Inter core communication Application Specific Libraries –Audio/Video CODECS –VoIP Components –WiMAX Toolkit, LTE Toolkit, –DSPLib others.. Host Computer Target Board XDS 560 V2 XDS 560 Trace Eclipse Code Composer Studio TM Third Party Plug-Ins Editor/IDE Compiler Linker (Codegen) Profiler Debugger Remote Debug SoC Analyzer Polycore ENEA Optima 3L Operating System w/ Boot Loader BIOS Full Silicon Entitlement Multicore Entitlement Linux Platform Development Kit Inter Core Communication Customer Application Speech Codec NDK Audio Codec Video Codec Demo App Multicore BIOS Demo App Multicore Linux Demo App Multicore BIOS and Linux DSPLIB IMGLIB Multicore Software Development Kit

16 Digital Signal Processing FFT Adaptive Filtering Filtering and convolution Others….. Available free from TI KeyStone Multicore Software – Libraries & Codecs MATLAB Image processing Math operations Vision Analytics Image Processing Edge Detection Boundary Morphology Others….. Available free from TI Voice and Fax Line Echo Cancellation Voice Activity Detection Others… Available free from TI Security/Cryptography AES, SHA1, 3DES Voice G.711, G.722 G.723, G.729 CDMA, AMR(NB/WB), EVRC-B Others Audio MPEG1 Layer2 AAC LC/HE AC3 2.0/5.1 Sample Rate Conversion Video H.263 H.264 MPEG2 MPEG4 VC1/WMV9 Decode Others Fax T.38 Fax Modem Libraries Codecs Vision Lib (object only) 50+ royalty-free kernels: Background modeling & subtraction Object feature extraction Tracking, recognition Low-level pixel processing

17 High-Performance and Multicore Processor High Value Easy to Use Quick to Market Low-Cost EVM High-Performance at the Right Power & Price Open & Affordable Tools User Community Drivers & Example Code Product Collateral Training Enabler Software Frameworks & Abstraction Generic Libraries Application Libraries Benchmarks & Functional Understanding Quick-Start Hardware Keystone Architecture

18 TI Confidential – NDA Restrictions Getting Started – More Information/Links Product Folders: –C66X Informational Wiki PageC66X Informational Wiki Page –All C6000 Multicore DSPsAll C6000 Multicore DSPs TMS320C6670 TMS320C6678 EVMs and Software Tools: –TMS320C6678 EVMTMS320C6678 EVM –TMS320C6670 EVMTMS320C6670 EVM –AMC to PCIe Adapter CardAMC to PCIe Adapter Card –Multicore Software Development Kit for BIOS & LinuxMulticore Software Development Kit for BIOS & Linux MCSDK Wiki CCS v5 Wiki C66x Linux Wiki –DSP Signal Processing Library(DSPLIB)DSP Signal Processing Library(DSPLIB) –Image and Video Processing Library (IMGLIB)Image and Video Processing Library (IMGLIB) –LTE /WiMAX Toolkit – Discuss with BDM Technical Support –TI E2E Community (Online Support)TI E2E Community (Online Support) –Product TrainingProduct Training TI Confidential – NDA Restrictions

19 Online Video Training

20 TI Confidential – NDA Restrictions Mission Critical DSP Market What Customers Like about TI Undisputed #1 DSP and SoC supplier – Strong Growth for 8 years in a row, even in 2009 – Higher R&D spending than DSP revenue of most competitors KeyStone SoC Architecture secures future success – Rich Product Portfolio & Strong Roadmap – 2 Families with multiple devices and growing Nyquist(6670), Shannon(6678/4/2) 40nm -> 28nm Tools/Software & Compilers 3 rd Party Eco-System – Multiple Design Wins Pre-Announcement Secure Supply – No DSP product discontinuation (end of life) History of delivery upon promises (Power, GHz,..) Field Experience - Completeness of system analysis, Architecture, Internal Switch, …. Customer Support Business Model - Long Term relationships with key customers – Actively seek and incorporate customer feedback in roadmap devices. TI SoC Architecture Layer 1 La ye r 2 Layer 3+ PHY MACMAC Layer 3, 4 RadioIP Network Macro Pico Femto Software Revenue

21 21 Backup Slides Product Details

22 TI Confidential – NDA Restrictions C6678 (Shannon) Lightning Half-Length PCIe Card Feature Set TI TMS320C6678 (8-core) x 4 C66x Core Frequency: 1.25GHz DDR3 Memory Data Frequency: 1600MHz Data Bus Width: 64-bit Serial RapidIO Gen-2 Interface PCIe Gen-2 Interface 10/100/1000Mbps Ethernet w/ SGMII Hyperlink50 Interface 1024 MB DDR on board PLX PEX8624 PCIe Gen-2 Switch Serial RapidIO daisy-chain Ethernet daisy-chain Each DSP device is linked to PCIe switch by x2 lanes Dual DSPs linked by Hyperlink50 Power: Max 54Watts

23 TI Confidential – NDA Restrictions What is Hyperlink? high-speed, low-latency, and low-pin-count communication interface 23 Low pin count (24 pins) Point to Point Connection Interconnect DSP-to-DSP DSP-to-FPGA. SerDes for data transfer x1 x4 modes for Tx and Rx 12.5GBaud/lane Effectively 8b9b encoding LVCMOS sideband signals for flow control & power mgmt - errors/events/timeouts * Simple packet-based transfer protocol for memory-mapped access * Read/Write to DSP/FPGA local memory - discrete memory access of any byte aligned width up to 64bits. - burst transfer modes Write (Maximum Burst Size 256Bytes) –Write Request ---> –Data Packet ---> Read (Maximum Burst Size 256Bytes) –Read Request ---> –Read Response - Interrupt Request Up to 64 Memory mapped Regions each region up to 256MB

24 TI Confidential – NDA Restrictions Universal Parallel Port (uPP) What is it? –Parallel bus, two independent channels (separate data buses) –I/O speeds up to 75 MHz with 8-16 bit data width per channel –1 or 2 channel parallel interface operating in RX, TX or FD mode –Supports Double data rate mode of operation (Bandwidth does not change/increase) Application –Each channel can interface cleanly with high-speed ADCs and/or DACs with up to 16-bit data width (per channel). –Useful as low cost interface with FPGAs. Can run up to 120MByte/s per channel in single channel or bi-directional mode ( 240MByte for both channels in unidirectional mode) –Can also be used to interface two C6655/57 devices or to connect C6655/57 with C674x or OMAP-L13x family of devices. Other benefits –Internal DMA – leaves CPU EDMA free –Simple protocol with few control pins (configurable: 2-4 per channel) –Multiple data packing formats for 9-15 bit data widths –Interleave mode (single channel only) –Simple interface: IO Queued by software Throughput Estimates: Note: Max. clock of 50 MHz in (*) configuration

25 25 Thank You

Download ppt "1 How to realize high-performance compute with Multicore DSP."

Similar presentations

Ads by Google