KeyStone Training Multicore Navigator: Packet DMA (PKTDMA)

Slides:



Advertisements
Similar presentations
CT213 – Computing system Organization
Advertisements

Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.
Introduction to the Enhanced AIM When Q.i was first introduced, a ladder logic driver program – called the Application Interface Module (AIM) - was created.
Processor System Architecture
Avishai Wool lecture Introduction to Systems Programming Lecture 8 Input-Output.
KeyStone Training Multicore Navigator Overview. Overview Agenda What is Navigator? – Definition – Architecture – Queue Manager Sub-System (QMSS) – Packet.
Boot Issues Processor comparison TigerSHARC multi-processor system Blackfin single-core.
Computer System Overview
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
Interface of DSP to Peripherals of PC Spring 2002 Supervisor: Broodney, Hen | Presenting: Yair Tshop Michael Behar בס " ד.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
Using Multicore Navigator Multicore Applications.
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
Multicore Navigator: Queue Manager Subsystem (QMSS)
1 CSC 2405: Computer Systems II Spring 2012 Dr. Tom Way.
The University of New Hampshire InterOperability Laboratory Serial ATA (SATA) Protocol Chapter 10 – Transport Layer.
Using Multicore Navigator
KeyStone Resource Manager June What is resource manager? LLD for global Resource management – static assignment of the device resources to DSP cores.
KeyStone Multicore Navigator
Interrupts. What Are Interrupts? Interrupts alter a program’s flow of control  Behavior is similar to a procedure call »Some significant differences.
KeyStone Training Network Coprocessor (NETCP) Overview.
KeyStone Training Serial RapidIO (SRIO) Subsystem.
MICROPROCESSOR INPUT/OUTPUT
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
C66x KeyStone Training HyperLink. Agenda 1.HyperLink Overview 2.Address Translation 3.Configuration 4.Example and Demo.
QMSS: Components Overview Major HW components of the QMSS: Queue Manager Two PDSPs (Packed Data Structure Processors): – Descriptor Accumulation / Queue.
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
Ethernet Driver Changes for NET+OS V5.1. Design Changes Resides in bsp\devices\ethernet directory. Source code broken into more C files. Native driver.
Using Direct Memory Access to Improve Performance
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Challenges in KeyStone Workshop Getting Ready for Hawking, Moonshot and Edison.
1 DSP handling of Video sources and Etherenet data flow Supervisor: Moni Orbach Students: Reuven Yogev Raviv Zehurai Technion – Israel Institute of Technology.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
NetCP - NWAL API Flow. NetCP (HW,SW) Overview NWAL Feature Overview Data path offload Control configuration –Blocking / Non Blocking support –L2: MAC.
Using Multicore Navigator CIV Application Team January 2012.
NS Training Hardware Traffic Flow Note: Traffic direction in the 1284 is classified as either forward or reverse. The forward direction is.
Network Coprocessor (NETCP) Overview
DMA Driver APIs DMA State Diagram Loading Driver and Opening Channel DMA Channel Attributes Loading Data to a Channel Unloading Data from a Channel.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
KeyStone SoC Training SRIO Demo: Board-to-Board Multicore Application Team.
CSL DAT Adapter CSL 2.x DAT Reference Implementation on EDMA3 hardware using EDMA3 Low level driver.
EE 345S Real-Time Digital Signal Processing Lab Fall 2008 Lab #3 Generating a Sine Wave Using the Hardware & Software Tools for the TI TMS320C6713 DSP.
Transmitter Interrupts Review of Receiver Interrupts How to Handle Transmitter Interrupts? Critical Regions Text: Tanenbaum
Muen Policy & Toolchain
Memory Management.
Computer Architecture
RX Watchdog Timer (WDT)
RX Compare Match Timer (CMT)
RX Data Transfer Controller (DTC)
RX 8-Bit Timer (TMR) 4/20/2011 Rev. 1.00
EE 445S Real-Time Digital Signal Processing Lab Spring 2017
NS Training Hardware.
CSCI 315 Operating Systems Design
Processor Fundamentals
Operating Systems Chapter 5: Input/Output Management
Chapter 13 DMA Programming.
EE 445S Real-Time Digital Signal Processing Lab Fall 2013
Moving Arrays -- 2 Completion of ideas needed for a general and complete program Final concepts needed for Final DMA.
Transmitter Interrupts
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
Process Description and Control
Lab. 4 – Part 2 Demonstrating and understanding multi-processor boot
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

KeyStone Training Multicore Navigator: Packet DMA (PKTDMA)

Agenda How Does the Packet DMA Work? – Triggering – Tx Processing – Rx Processing – Infrastructure Processing How to Program the Packet DMA – Registers – Low Level Driver Final Advice / Tips

How Does the Packet DMA Work? – Triggering – Tx Processing – Rx Processing – Infrastructure Processing How to Program the Packet DMA – Registers – Low Level Driver Final Advice / Tips

Packet DMA Control Understanding how the PKTDMAs are triggered and controlled is critical.

Tx DMA Processing Once triggered, the Tx channel is included in a 4-level priority round robin by the Tx Scheduling Control. Once selected for processing, Tx Core is started, pops the first descriptor off the associated queue and begins processing. Data are buffered temporarily in a per-channel Tx FIFO. Data are sent out the Streaming I/F at 128 bits/clock. When done, the descriptor is automatically recycled to a Tx FDQ.

Rx DMA Processing For receive, the Rx DMA is triggered by receiving data from the Rx Streaming I/F (also at 128 bits/clock). Based on the channel (or packet info), the Rx DMA opens an Rx Flow (which contains processing instructions such as the destination Rx queue and Rx FDQs that can be used). Data are buffered temporarily in a per-channel Rx FIFO. Data are written to destination memory and the descriptor is pushed to the destination Rx queue, which should be recycled by the Host.

Infrastructure Packet DMA The QMSS PKTDMA’s Rx and Tx Streaming I/F are wired together to create loopback. Data sent out the Tx side are immediately received by the Rx side. This PKTDMA is used for core-to-core transfers and peripheral-to-DSP transfers. Because the DSP is often the recipient, a descriptor accumulator can be used to gather (pop) descriptors and interrupt the host with a list of descriptor addresses. The host must recycle them.

How Does the Packet DMA Work? – Triggering – Tx Processing – Rx Processing – Infrastructure Processing How to Program the Packet DMA – Registers – Low Level Driver Final Advice / Tips

Step 1: Logical QM Mapping (Optional) The physical 8192 queue Queue Manager is viewed by each PKTDMA as four logical queue managers, each having up to 4095 usable queues each. – Two fields are used: a 2-bit “qmgr” (queue manager 0..3) and a 12-bit “qnum” (queue number ). – A logical QM is created by programming this register to the queue region base address + 16 * Q, where Q is the first physical queue. Example: QM2 = 0x xc000 maps queue 3072 to qmgr=2, qnum=0. – A logical 8K QM can be created: QM0 maps to 0, QM1 maps to In this way only can a queue number be used as a single14-bit value. – If overlapping QMs are programmed, a physical queue may be referenced by more than one qmgr:qnum pair. – Software must map logical to physical queue numbers. – Queue Manager N Base Address: Queue Manager 0 reset value points to physical queue zero. All others must point to an offset in units of 16 bytes from queue zero. Unused QMs can be left uninitialized.

Step 2: Rx Flow Setup Rx Flow – Flows are not tied to a channel; They can change at packet rate since the flow ID is sent from the Streaming I/F with each SOP. – The flow instructs the Rx DMA how to create an output descriptor based on the following: Descriptor type Data offset Sideband info include or ignore (PS, EPIB, etc.) Rx qmgr:qnum (destination queue) Rx FDQ selection: – Single – Based on packet size – Based on buffer number (Host packet type only) – Rx Flow N Config Registers A-H: See Register Definitions in the Multicore Navigator Users Guide.

Step 3: Rx DMA Setup For each Rx channel of each Navigator peripheral to be used, initialize the following: – Rx Channel Control Rx Channel N Global Config Register A: – Channel enable, pause, teardown enable – Channel should be disabled while configuring (this also includes configuring the Rx Flows). – This is typically the last initialization step for each Rx DMA. (Rx flows should also be programmed prior to this register)

Step 4: Tx DMA Setup For each Tx channel of each Navigator peripheral to be used, initialize the following: – Tx Channel Priority Tx Channel N Scheduler Configuration: – Configures a 4-bit priority for the 4 level round robin. – Defaults to zero (high) – Tx Channel Control Tx Channel N Global Config Register A: – Channel enable, pause, teardown enable – Channel should be disabled while configuring – This is typically the last initialization step for each Tx DMA.

Step 5: Other Initialization Initialize Rx and Tx FDQs: – Initialize the descriptor fields Rx Host descriptors must have buffers attached. Rx Host descriptors must not be chained together. Tx Host descriptors for FDQs can be chained or not chained, and buffers attached or not (software decides how to handle this). – Push the initialized descriptors into an FDQ with Que N Register D (QM registers) – Assuming the Queue Manager has been initialized prior to the PKTDMA instances, Navigator is now ready to be used

Packet DMA Low Level Driver (LLD) Packet DMA is commonly represented in the LLD as CPPI Provides an abstraction of register-level details Provides two usage modes: – User manages/selects resources to be used. Generally faster – LLD manages/selects resources. Generally easier Allocates a minimal amount of memory for bookkeeping purposes. Built as two drivers: – QMSS LLD is a standalone driver for QM and Accumulators. – CPPI LLD is a driver for PKTDMA that requires the QMSS LLD. The following slides do not present the full API.

CPPI LLD Initialization The following are one-time initialization routines to configure the LLD globally: – Cppi_init(pktdma_global_parms); Configures the LLD for one PKTDMA instance May be called on one or all cores Must be called once for each PKTDMA to be used – Cppi_exit(); Deinitializes the CPPI LLD

CPPI LLD: PKTDMA Channel Setup More handles to manage in using the PKTDMA LLD To allocate a handle for a PKTDMA: – pktdma_handle = CPPI_open(pktdma_parms); Returns a handle for one PKTDMA instance Should be called once for each PKTDMA required. To allocate and release Rx channels: – rx_handle = Cppi_rxChannelOpen(pktdma_handle, cfg, *flag); Once “open”, the DSP may use the Rx channel. – cfg refers to the Rx channel’s setup parameters – flag is returned true if the channel is already allocated – Cppi_channelClose(rx_handle); Releases the handle preventing further use of the queue

More Packet DMA Channel Setup To allocate and release Tx channels: – tx_handle = Cppi_txChannelOpen(pktdma_handle, cfg, *flag); Same as the Rx counterpart – Cppi_channelClose(tx_handle); Same as the Rx counterpart To configure/open an Rx Flow: – flow_handle = Cppi_configureRxFlow(pktdma_handle, cfg, *flag); Similar to the Rx channel counterpart

PKTDMA Channel Control APIs to control Rx and Tx channel use: – Cppi_channelEnable(tx/rx_handle); Allows the channel to begin operation – Cppi_channelDisable(tx/rx_handle); Allows for an immediate, hard stop. Usually not recommended unless following a pause. – Cppi_channelPause(tx/rx_handle); Allows for a graceful stop at next end-of-packet – Cppi_channelTeardown(tx/rx_handle); Allows for a coordinated stop

QMSS/CPPI LLD – Runtime Use Once initialization is finally complete, control is very simple: – desc_ptr = Qmss_queuePop(queue_handle); Pop a descriptor address from a queue. – Cppi_setData(type, *inbuf, *desc_ptr, len); Converts an “LLD format” descriptor to hardware format. – Qmss_queuePushDesc(queue_handle, desc_ptr); Push the filled descriptor to a queue corresponding to a Tx DMA channel for processing.

Programming Tips How Does the Packet DMA Work? – Triggering – Tx Processing – Rx Processing – Infrastructure Processing How to Program the Packet DMA – Registers – Low Level Driver Final Advice / Tips

Final Advice Multicore Navigator is very flexible and can be used in many ways to solve the same problem. Your application must create and follow it’s own rules. For example: – FDQ usage – Tx Host type pre-linked or not? Buffers attached? – FDQ mapping – to core, size, memory, function, flow? – Keep it simple – use one resource for one purpose! Multicore Navigator hardware assumes you have programmed it correctly – which can make it difficult to find errors. – Follow good software development practices to encourage proper use of the Navigator hardware blocks. – The Simulator catches several common configuration problems and will throw warnings to the debug log files.

For More Information For more information, refer to the to Multicore Navigator User Guide For questions regarding topics covered in this training, visit the support forums at the TI E2E Community website. TI E2E Community