Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Coprocessor (NETCP) Packet Accelerator (PA)

Similar presentations


Presentation on theme: "Network Coprocessor (NETCP) Packet Accelerator (PA)"— Presentation transcript:

1 Network Coprocessor (NETCP) Packet Accelerator (PA)
Hello, and welcome to the keystone training for the network coprocessor – packet acelerator.

2 Agenda Applications Hardware Modules Firmware
PA Low Level Driver (LLD) Programming Example Lets look at the agenda for today. We will begin by looking at some applications for the packet accelerator, followed by a look at the hardware modules that make up the packet accelerator. Then we will move on to the firmware, followed by an examination of the PA low level driver. Lastly, we will move on to a programming example using the PA and the PA LLD.

3 Packet Accelerator: Applications
Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 3

4 Packet Accelerator Applications
Packet accelerator saves cycles from host DSP cores. Option of single IP address for multi-core device Multicore device internal architecture is abstracted 8192 multiple-in, multiple-out HW queues UDP (and TCP) Checksum and selected CRCs also for proprietary header formats Verification on ingress and generation on egress L2 Support Ethernet: Ethertype and VLAN MPLS L3/L4 Support IPv4/6 and UDP port based routing Raw Ethernet or IPv4/6 and Proprietary UDP like protocol support GRE QOS Per channel / flow to individual queue towards host DSPs TX traffic shaping Access to the Security Accelerator IPSec ESP and AH tunnel, SRTP Multicast to multiple queues For example, Ethernet broadcast copied and pushed to 1-8 queues Timestamps IEEE 1588 timestamps and configurable generic timestamps The purpose of the PA is to save the host DSP cycles. The packet accelerator does this mainly by classifying Layer 2, Layer 3, Layer 4, and other custom received packets before they reach the host DSP. The packet accelerator also saves the DSP time by performing checksums on packets before they are transmitted. The packet accelerator also supports multicasting packets to multiple queues, so that, for example, the same packet can be sent to multiple queues where each core has been assigned a different queue to watch. The packet accelerator has access to 8192 the queues in the multicore navigator, and Through these queues, has access to the security accelerator for IPsec, SRTP and other supported encryption schemes

5 Packet Accelerator: Applications
Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 5

6 Network Coprocessor (NETCP)
Packet Accelerator (PA) Support for single or multiple IP addresses 1 Gbps wire-speed throughput at 1.5 Mpps UDP Checksum processing IPSec ESP and AH tunnels with fast path fully offloaded L2 support: Ethernet, Ethertype, and VLAN L3/L4 Support: IPv4/IPv6 and UDP port-based raw Ethernet or IPv4/6 and SCTP port-based routing Multicast to multiple queues QoS capability: Per channel/flow to individual queue towards DSP cores and support for TX traffic shaping per device NEW 6

7 SGMII and Ethernet Switch

8 PA: High-Level Overview
L2 Classify Engine Used for matching L2 headers Example headers: MAC, VLAN, LLC snap L3 Classify Engine 0 Used for matching L3 headers Example headers: IPv4, IPv6, Custom L3 Also uses Multicore Navigator to match ESP headers and direct packets to SA L3 Classify Engine 1 Typically used for matching L3 headers in IPsec tunnels L4 Classify Engine Used for matching L4 Headers Example headers: UDP, TCP, Custom L4 Modify/Multi-Route Engines Used for Modification, Multi-route, and Statistics requests Modification Example: generate IP or UDP header checksums Multi-route Example: route a packet to multiple queues PA Statistics Block Stores statistics for packets processed by the classify engines Statistics requests typically handled by Modify/Multi-route engines Packet ID Manager Assigns packet ID to packets 8

9 Packet Accelerator: Firmware
Applications Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 9

10 PA: Hardware and Firmware
One L2 Classify Engine PDSP Pass 1 Lookup Table (LUT1) Timer Classify 1 (c1) firmware image Two L3 Classify Engines One L4 Classify Engine Pass 2 Lookup Table (LUT2) Classify 2 (c2) firmware image Two Modify/Multi-Route Engines Modify (m) firmware image One Packet ID Manager One PA Statistics Block NOTE: Firmware images must be loaded onto the PDSPs before using the PA engines. 10

11 PA Low Level Driver Overview
Applications Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 11

12 PA LLD Overview PA LLD provides an abstraction layer between the application and the PA. It translates packet headers and routing requirements into configuration information that is used by the PA firmware. PA LLD provides the command/response interface for PA configurations: LUT1 LUT2 CRC generation Multi-route NOTE: The most general configuration must entered into the PDSPs before any overlapping, more specific configuration The PA LLD also handles linking together entries in separate lookup tables. For example, linking an entry in an L2 classify lookup table to an entry in an L3 classify lookup table. PA LLD does not provide transport layer; This is handled by the Multicore Navigator. API calls are non-blocking. PA LLD references: pa/docs/paDocs.chm pa/docs/pa_sds.pdf PA LLD provides an abstraction layer between the application and the PA. The PA LLD will translate packet headers and routing requirements that are provided by the user into configuration packets and descriptor commands that can be understood by the PA firmware. The PA LLD also handles linking together entries in separate lookup tables. For example, an entry in a layer 2 classify lookup table can be linked to an entry in an layer 3 classify lookup table, which can in turn be linked to an entry in an layer 4 classify lookup table. This allows a packet to be matched with an specific layer 2, layer 3, and layer 4 address combination. For this example, if the layer 2 address matches, but the layer 3 address does not match, then the packet be routed according to the specified failure route. In addition, the PA LLD allows an entry in a lookup table to be linked to several entries in another lookup table. For example, a single entry in an layer 3 classify lookup table can linked to multiple entries in a layer 4 lookup table.

13 PA LLD Functional Diagram
Benefits Abstracts the operation of the PA from the application OS-independent Multi-instance for multicore NOTE: PA LLD runs on the host DSP and is external in the PA. Now lets take a look at how the PA LLD communicates with the host, the Multicore navigator, and the NETCP. The host application will make calls to the PA LLD to do things such as adding entries to lookup tables and generating commands for CRC checksums. All of this is done by the host application, and will operate on host memory. Once the configuration packet or command data has been generated, it is sent to the NETCP through the Multi-core navigator. Once in the PA the command will be executed by the appropriate module. Once the command finishes processing, the PA may send a response packet back to the host application using the multicore navigator to notify the application, and the PA LLD whether or not the configuration was successful.

14 PA Low Level Drivers (PA LLD)
Applications Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 14

15 PA LLD API: System paReturn_t  Pa_getBufferReq (paSizeInfo_t *sizeCfg, int sizes[], int aligns[]) Pa_getBufferReq returns the memory requirements for the PA driver. Pa_create (paConfig_t *cfg, void *bases[], Pa_Handle *pHandle) Pa_create creates the PA driver instance. Pa_close (Pa_Handle handle, void *bases[]) Pa_close decativates the PA driver instance. Pa_requestStats (Pa_Handle iHandle, uint16_t doClear, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_requestStats requests sub-system statistics. paSysStats_t *  Pa_formatStatsReply (Pa_Handle handle, paCmd_t cmd) Pa_formatStatsReply formats a stats request from the PA. paSSstate_t  Pa_resetControl (paSSstate_t newState) Pa_resetControl controls the reset state of the Sub-system. Pa_downloadImage (int modId, void *image, int sizeBytes) Pa_downloadImage downloads a PDSP image to a sub-system with the packet processing modules in reset. Now lets move on to the APIs provided by the PA LLD. The PA LLD APIs can be broken down into 4 parts: system APIs, configuration APIs, custom configuration APIs, and Utility functions. Let’s begin with the System APIs. The system APIs provide basic system functions such as opening and closing a pa lld instance using the create and close functions, as well as functions for resetting and initializing PA hardware using the resetControl function, and downloading the PDSP firmware using the downloadImage function.

16 PA LLD API: Configuration
paReturn_t  Pa_addMac (Pa_Handle iHandle, paEthInfo_t *ethInfo, paRouteInfo_t *routeInfo, paRouteInfo_t *nextRtFail, paHandleL2L3_t *handle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_addMac adds a mac address to the L2 table. Pa_delHandle (Pa_Handle iHandle, paHandleL2L3_t handle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_delHandle deletes a MAC or IP handle. Pa_delL4Handle (Pa_Handle iHandle, paHandleL4_t handle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_delL4Handle deletes a TCP or UDP handle. Pa_addIp (Pa_Handle iHandle, paIpInfo_t *ipInfo, paHandleL2L3_t prevLink, paRouteInfo_t *routeInfo, paRouteInfo_t *nextRtFail, paHandleL2L3_t *retHandle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_addIp adds an IP address to the L3 table. Pa_addPort (Pa_Handle iHandle, uint16_t destPort, paHandleL2L3_t linkHandle, paRouteInfo_t *routeInfo, paHandleL4_t retHandle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_addPort adds a destination TCP/UDP port to the L4 table. Pa_forwardResult (Pa_Handle iHandle, void *vresult, paHandle_t *retHandle, int *handleType, int *cmdDest) Pa_forwardResult examines the reply of the sub-system to a command. Pa_configRouteErrPacket (Pa_Handle iHandle, int nRoute, int *errorTypes, paRouteInfo_t *eRoutes, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_configRouteErrPacket configures the routing of packets that match error conditions. Now lets take a look at the Configuration APIs. The configuration APIs provide functions for generating configuration packets that will instruct the PA firmware how to add and remove entries from the lookup tables in the PA, as well as functions for how to route error packets. An example of a function that will generate a configuration packet to allow the PA firmware to add an entry to a lookup table is the addIp function, which will add an entry to a layer 3 classify engine lookup table. If the application then wanted to generate a configuration packet to allow the PA firmware to remove that entry from the layer 3 classify engine lookup table, then it could call the delHandle function. The delHandle function can also generate configuration packets to remove entries from layer 2 classify engine lookup tables.

17 PA LLD API: Custom Configuration
paReturn_t  Pa_setCustomL3 (Pa_Handle iHandle, uint16_t customEthertype, uint16_t parseByteOffset, uint8_t byteMasks[pa_NUM_BYTES_CUSTOM_L3], paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_setCustomL3 performs the global configuration for level 3 custom lookups. Pa_addCustomL3 (Pa_Handle iHandle, uint8_t match[pa_NUM_BYTES_CUSTOM_L3], paRouteInfo_t *routeInfo, paRouteInfo_t *nextRtFail, paHandleL2L3_t prevLink, paHandleL2L3_t *retHandle, int nextHdrType, uint16_t nextOffset, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_AddCustomL3 adds a custom lookup entry to the lookup tables. Pa_setCustomL4 (Pa_Handle iHandle, uint16_t handleLink, uint16_t udpCustomPort, uint16_t byteOffsets[pa_NUM_BYTES_CUSTOM_L4], uint8_t byteMasks[pa_NUM_BYTES_CUSTOM_L4], paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_setCustomL4 performs the global configuration for level 4 custom lookups. Pa_addCustomL4 (Pa_Handle iHandle, paHandleL2L3_t prevLink, uint8_t match[pa_NUM_BYTES_CUSTOM_L4], paRouteInfo_t *routeInfo, paHandleL4_t retHandle, paCmd_t cmd, uint16_t *cmdSize, paCmdReply_t *reply, int *cmdDest) Pa_addCustomL4 adds a custom lookup to the lookup tables. Now lets take a look at the Custom Configuration APIs. The custom configuration APIs are similar to the normal configuration APIs. They are still used to generate configuration packets for adding entries to the lookup tables, except that the configuration packets that are generated in this case are for custom entries. The normal configuration APIs can still be used to generate configuration packets that will allow the firmware to remove entries from the lookup tables.

18 PA LLD API: Utility Functions
paReturn_t  Pa_formatTxRoute (paTxChksum_t *chk0, paTxChksum_t *chk1, paRouteInfo_t *route, void *cmdBuffer, int *cmdSize) Pa_formatTxRoute formats the commands to add checksums and route a Tx packet. Pa_formatRoutePatch (paRouteInfo_t *route, paPatchInfo_t *patch, void *cmdBuffer, int *cmdSize) Pa_formatRoutePatch formats the commands to route a packet and blind patch. The Utility functions provided by the PA LLD will create instructions that can be provided to the modify/multiroute engines. For example, the format transmit route API will create a command that can be placed in the protocol specific data section of a descriptor. Then, when the packet DMA transfers that descriptor and linked data packet to one of the modify/multiroute engines, a UDP checksum, or other checksum, can be generated for that linked packet. Then the modify/multi-route engine can route that packet to another destination, such as the transmit queue for gigabit ethernet switch, which will allow the packet to be transmitted over an compliant network.

19 LLD HTML Documentation
Show example from the HTML file: Pa_addMac Pa_configExceptionRoute

20 Download the Firmware Int paDownloadFirmware (void) { Int i;
Pa_resetControl (paInst, pa_STATE_RESET); /* PDPSs 0-2 use image c1 */ for (i = 0; i < 3; i++) Pa_downloadImage (paInst, i, (Ptr)c1, c1Size); /* PDSP 3 uses image c2 */ Pa_downloadImage (paInst, 3, (Ptr)c2, c2Size); /* PDSPs 4-5 use image m */ for (i = 4; i < 6; i++) Pa_downloadImage (paInst, i, (Ptr)m, mSize); Pa_resetControl (paInst, pa_STATE_ENABLE); return (0); }

21 PA LLD: Programming Example
Applications Hardware Modules Firmware PA Low Level Driver (LLD) Programming Example Lets get started with the applications. 21

22 PA LLD: Basic Configuration
This process must be done before configuring PA RX or TX. Basic Configuration NETCP Step 2: Initialize PA and load the firmware: Pa_resetControl(DISABLE) Pa_downloadImage() Pa_resetControl(ENABLE) CorePac PA LLD Step 1: Set up memory: Pa_getBufferReq() Pa_create() PA Multicore Navigator QMSS SA PKTDMA Lets start with the basic configuraiton for the PA. In step 1, we setup the memory for the PA. We do this by first calling the get buffer requirements api to determine the memory requirements for the PA LLD. We then call the create function with the memory requirements to create the PA LLD instance. In step 2, we initialize the PA and load the firmware. We do this by first calling the resetControl function with the disable input. Then we call the downloadImage to load the firmware on the PDSPs. Lastly we call the resetControl function with the enable input to complete the PA initialization. At this point the basic configuration for the PA and the PA LLD have been completed, and we can begin doing things like adding entries to the lookup tables and generating commands for checksums. PKTDMA

23 PA LLD: PA Routing PA LLD provides a routing structure which allows the following parameters to be configured: Destination Flow ID Queue Multi-Route Handle (Index) Software Info 0 Software Info 1 Several possible destinations pa_DEST_HOST pa_DEST_EMAC pa_DEST_SASS0 pa_DEST_SASS1 pa_DEST_DISCARD pa_DEST_CONTINUE_PARSE MAC Routing Example: paRouteInfo_t macRoute; /* Continue parsing -- try to match IP handle*/ macRoute.dest = pa_DEST_CONTINUE_PARSE; macRoute.flowId = 0; macRoute.queue = 0; macRoute.mRouteHandle = -1; macRoute.swInfo0 = 0; /* Don’t Care */ macRoute.swInfo1 = 0; /* Don’t Care */ Port Routing Example: paRouteInfo_t portRoute; /* Send all matches to the queue specified */ portRoute.dest = pa_DEST_HOST; portRoute.flowId = 5; portRoute.queue = 900; portRoute.mRouteHandle = -1; portRoute.swInfo0 = 0; /* Don’t Care */ portRoute.swInfo1 = 0; /* Don’t Care */ Before we look at another example, lets take a look at the routing structure used by the PA LLD. The routing structure is used by the PA LLD configuration functions to generate commands to tell the PA how a packet should be routed. This information gets placed in a configuration packet to add an entry to a lookup table to tell the PA how to route a packet after matching a header. The routing structure is also used by the PA LLD utility functions to tell the PA how to route a packet after doing a checksum. The PALLD provides the following fields for configuration: Destination, Flow ID, Queue, Multi-Route Handle, Software Info0, and Software Info1. Lets start by talking about the destinations. The destination can be one of 6 possible destinations. The HOST destination tells the PA to route the packet to a receive queue. This can be any queue, and the queue number is specified in the queue field in the routing structure. The EMAC destination tells the PA to route the packet to the gigabit ethernet switch subsystem. This will route the packet out of the PA to the transmit queue of the gigabit Ethernet Switch. The SASS0 destination tells the PA to route the packet to the first transmit queue for the SA. The SASS1 destination tells the PA to route the packet to the second transmit queue for the SA. The DISCARD destination tells the PA to discard the packet. The last destination is the CONTINUE_PARSE destination. This tells the packet accelerator to route the packet inside the PA from one module to another. For example, this might be used when adding a MAC entry to a lookup table. That way, when the L2 classify engine matches the packet, the packet will say inside the PA and be routed to the L3 classify engine 0 to try to match an IP or custom L3 address. The flow id field in the data structure is used to specify which packet dma receive flow should be used. This field is ignored for the DISCARD and CONTINUE_PARSE destinations. The queue field in the data structure is used to specify which queue should be used when sending the packet to a queue. If sending to the EMAC destination, then the queue number should be the transmit queue for the gigabit Ethernet Switch. If sending the packet to the SASS0 destination, the queue number for the first SA transmit queue. If sending the packet to the SASS1 destination, the queue number for the second SA transmit queue. The only the PA is used, then the software info words can be given any value is desired. If routing a packet from the PA to the SA through the SA transmit queues (like in the case of IPsec decryption), the software info words must contain information required by the SA.

24 PA LLD: Rx Configuration
Repeat steps 1-7 to add more MAC addresses, IP addresses, or TCP/UDP ports to PA LUTs before receiving data packets. Configuration Information NETCP CorePac Step 1: Call PA LLD configuration or custom configuration API with routing information: Pa_addMac() or Pa_addIp() or Pa_addPort() PA LLD PA Multicore Navigator QMSS Step 2: Receive formatted command to be sent to PA. SA PKTDMA Now lets take a look at how to setup the receive configuration, using the PA LLD. Following this process we will add entries to the lookup tables in the classify engines. Step 1 is to call one of the PA LLD configuration, or custom configuration APIs with the address that need to be added to the lookup table. The information about where to send a packet after a match, and where to send the configuration response packet should also be provided so that it can be included in the configuration packet. In step 2, the application will receive a formatted configuration packet back from the PA LLD. The PA LLD will also return a queue index of where the packet should be sent. Step 3 is to set the protocol specific data section of the descriptor by calling the setPSData function. Once this is done, then the packet can be pushed onto the transmit queue index that is returned by the PA LLD. At this point, there is no more interaction required by the host until the configuration response packet is received from the PA. In step 4, the packet DMA will automatically pop the descriptor pointer from the transmit queue and send the packet to the network coprocessor. After the transmit has completed, the transmit descriptor will be pushed onto the transmit completion queue by the packet dma. PKTDMA Step 4: PKTDMA automatically pops descriptor from Tx queue and sends packet to NETCP. After PKTDMA finishes the data transfer, the Tx descriptor is returned to the specified packet completion queue. Step 3: Send command: /* Mark as command packet */ Cppi_setPSData() /* Send command to PA */ QMSS_queuePush()

25 PA LLD: Rx Configuration
Step 5: The PA LLD will indicate the appropriate transmit queue index. Using the queue returned from Pa_addMac() in this example. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 In step 5, the configuration packet is shown being transferred by the packet DMA from the transmit queue to the network coprocessor. Once in the network coprocessor, the configuration packet will be directed from the packet dma controller, through the packet streaming switch, to the desired classify engine. The classify engine will then attempt to add the entry to its lookup table. In this example, we are showing the packet being routed from transmit queue 640, which is the queue for PDSP0 in the L2 classify engine; however, the process is almost identical for all of the classify engines, The only difference is which transmit queue the packet will be pushed onto, which is dependant on which configuration API was called. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 25

26 PA LLD: Rx Configuration
Step 6: The PDSP will return the status of adding the entry to the lookup table. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 In step 6, a configuration response packet is sent from the classify engine to indicate whether the entry was successfully added to the lookup table. The command response packet is routed from the classify engine, through the packet streaming switch, to the packet dma controller. Once in the packet DMA controller, the packet DMA will transfer the packet from the network coprocessor, using the receive flow and receive queue specified in the PA command reply data structure when the PA LLD configuration API was called. In this example, we are showing the packet being routed from receive queue 900. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 26

27 PA LLD: Rx Configuration
Repeat steps 1-9 to add more MAC addresses, IP addresses, or TCP/UDP ports to PA LUTs before receiving data packets. Configuration Information NETCP CorePac PA LLD PA Multicore Navigator QMSS SA PKTDMA In step 7, once the packet has been transferred to host memory by the packet DMA, the receive flow will write the packet descriptor pointer onto the receive queue. In step 8, the host must once again take action. Here the host pops the descriptor off of the receive queue using the QMSS_queuePop API. In the last step, step 9, the result from adding the entry to the lookup table is forwarded to the PA LLD to be analyzed. If an error occurred while adding the entry to the lookup table, then the forwardResult API will return the appropriate error status. Steps 1-9 can be repeated to add more entries to the classify engine lookup tables. PKTDMA Step 7: Once the data has been successfully transferred to host memory by the PKTDMA, the receive flow pushes the descriptor pointer onto the queue specified in the paCmdReply_t structure. Step 8: Pop descriptor: QMSS_queuePop() Step 9: Forward Result to PA LLD: Pa_forwardResult() 27

28 PA Rx Hardware Processing
Step 1: A packet formatted with MAC, IPv4, and UDP headers arrives from the gigabit Ethernet switch subsystem and is routed over the packet streaming switch to the L2 Classify Engine. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 Now that we have shown the process of how to configure the PA, lets take a look at an example of how receive packets are processed by the PA. Assume for the sake of this example, that the L2 classify engine lookup table, L3 classify engine 0 lookup table, and the L4 classify engine lookup table have been configured with entries. Then if we receive a packet formatted with MAC, IPv4, and UDP headers from the gigabit ethernet switch subsystem, then the receive processing will go as follows. In the first step, the packet will be routed from the gigabit ethernet switch over the streaming interface switch to the L2 classify engine. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 28

29 PA Rx Hardware Processing
Step 2: PDSP0 in the L2 Classify Engine submits the MAC header for lookup. Assume that the lookup is successful. The packet will then be routed to its next destination. Assume that the destination is L3 Classify Engine 0. PDSP0 LUT1 matches MAC entry. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 Step 2: Now that the packet has arrived at the L2 classify engine, it will examine the MAC header and compare it against the entries in its lookup table. If the MAC header is successfully matched, then the packet will be routed according to the matchroute that was specified when the entry was added to the lookup table. If the lookup fails, then the packet will be routed according to the failure route that was specified when the entry was added to the lookup table. For this example, lets assume that the addresses match, and that the entry was added with the pa_DEST_CONTINUE_PARSE destination, which means that the packet will be routed to through the packet streaming switch to the L3 classify engine 0 to match the IPv4 header. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 29

30 PA Rx Hardware Processing
Step 3: The packet is routed to the L3 Classify Engine 0. PDSP1 submits the IPv4 header for lookup. Assume that the lookup is successful. The packet will then be routed to its next destination. Assume that it is the L4 Classify Engine. Q640: PDSP0 PDSP1 LUT1 matches IPv4 entry. Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 Step 3: Once the packet reaches the L3 classify engine 0, PDSP1 will examine the IPv4 header, and submit it to its lookup table. The lookup table will then compare the compare the packet header against the entries stored in the lookup table. If the headers match, then the packet will be routed according to the matchroute that was specified when the entry was added to the lookup table. Again, if the lookup fails, then the packet will be routed according to the failure route that was specified when the entry was added to the lookup table. For this example, lets assume that the addresses match, and that this entry was also added with the pa_DEST_CONTINUE_PARSE destination, This means that the packet will be routed to through the packet streaming switch to the L4 classify engine to match the UDP header. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 30

31 PA Rx Hardware Processing
Step 4: The packet is routed to the L4 Classify Engine. PDSP3 submits the UDP header for lookup. Assume that the lookup is successful. The packet will then be routed to its next destination. Assume that the destination is the host on queue 900. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 PDSP3 LUT2 matches UDP entry. Q644: PDSP4 Q645: PDSP5 Step 4: Once the packet reaches the L4 classify engine, PDSP3 will examine the UDP header, and submit it to its lookup table, where the lookup table will compare the packet header against the UDP entries that have been added to the lookup table. Again, if the headers match, then the packet will be routed according to the matchroute that was specified when the entry was added to the lookup table. If the lookup fails, then the packet will be routed according to the failure route that was specified when the entry was added to the lookup table. For this example, lets assume that the addresses match, and that the entry was added with the pa_DEST_HOST destination and with the queue specified as queue 900. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 31

32 PA Rx Hardware Processing
Step 5: The packet is routed from the L4 Classify Engine, through the packet streaming switch to the PKTDMA controller. The PKTDMA will then transfer the packet from the NETCP to host queue 900. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 In step 5, the packet will be routed through the packet streaming switch to the packet DMA controller. Once in the packet DMA controller, the packet DMA will transfer the packet from the network coprocessor to the host, using the specified receive flow to transmit the packet to queue 900. This completes the hardware receive processing example. It is important to note that once the configuration has been completed, which we assumed was done before starting this example, this entire process requires no interaction from the host. It is also important to note that the packet could have come from another location, and have been formatted with different headers. An alternate example is that a packet could have arrived the transmit queue for the L3 classify engine, and could be checked against custom L3 entries that have been added to its lookup table. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 32

33 PA LLD: Receive Packets from Ethernet
Steps 1-2 are repeated for all RX data packets. Receive Data Packet Step 1: Once the data has been successfully transferred to host memory by the PKTDMA, the Rx flow pushes the descriptor pointer onto the queue specified in paRouteInfo_t structure. NETCP CorePac PA LLD PA Multicore Navigator QMSS SA PKTDMA Now, lets take a look at what receiving packets from the ethernet looks like from the perspective of the host. In step 1, after the packet has matched all the packet headers in the PA, the packet is transferred from the network coprocessor to a queue. At this point, if the host is watching the destination queue, then it can immediately take action. Alternatively, if the queues are setup to be monitored through an interrupt service routine, then the host can be notified through an interrupt. In either case, once the host is ready to process the packet, in step 2, it can pop the descriptor off of the receive queue using the QMSS queue pop function. Steps 1 and 2 can be repeated for all received data packets. PKTDMA Step 2: Pop descriptor to process the packet: QMSS_queuePop()

34 PA LLD: Send Transmit Packet
Transmit Configuration Repeat steps 1-4 to create more send more TX packets. Transmit Data Packet NETCP CorePac PA LLD Step 1: Call PA LLD API: Pa_formatTxRoute() PA Multicore Navigator QMSS Step 2: Receive formatted TX command, to be used as PSData. SA PKTDMA Now that we have completed the receive processing example, lets take a look at a transmit processing example. For the sake of this example, lets assume that we are transmitting the packet out over the Ethernet. We begin step 1 by calling the format transmit route function with the pseudo checksums for the desired header checksums. Then, in step 2, the we receive a command back from the PA LLD, to be placed in the protocol specific data section of the transmit descriptor In step 3, we write the protocol specific data into the descriptor using the setPSData function. Then, we link data packet to the data buffer, and push the packet onto the transmit queue. At this point, there are no more actions required by the host. In step 4, the PKTDMA will automatically transfer the packet to the network coprocessor. After the transfer has been completed, the transmit descriptor is written to the transmit completion queue by the packet dma. Steps 1-4 should be repeated for all transmit packets. PKTDMA Step 4: PKTDMA automatically pops descriptor from Tx queue and sends packet to NETCP. After PKTDMA finishes the data transfer, the TX descriptor is returned to the specified packet completion queue. Step 3: Set TX route, link packet, and push onto Tx queue: /* Put TX route info into PSData */ Cppi_setPSData() /* Link Packet */ Cppi_setData() /* Send command to PA */ QMSS_queuePush()

35 PA Tx Hardware Processing
Step 1: A packet is placed in the Tx queue for PDSP4. The PKTDMA transfers the packet to the Modify/Multi-route Engine 0. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 Now that we have shown the transmit process from the perspective of the host, lets take a look at what happens in the PA hardware. In the first step, a packet is placed in the transmit queue for pdsp4, which lies in the modify/multi-route engine 0. The packet DMA will transfer the packet from the queue to the packet dma controller in the network coprocessor. The packet will then be routed to the modify-multi/route engine 0 via the packet streaming switch. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 35

36 PA TX Hardware Processing Example
Step 2: PDSP4 generates the checksums, writes them to the packet, and transmits the packet to the queue for the GbE switch via the PKTDMA. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 PDSP4 generates the checksums in the packet. Q644: PDSP4 Q645: PDSP5 For step 2, once packet arrives in the modify/multi-route engine 0, PDSP4 will perform the checksum operation on the packet. PDSP4 can do up to 2 checksums simultaneously, so, for example, it could do an IP and a UDP checksum simultaneously. Once PDSP4 finishes the checksums, the packet will be routed through the packet streaming switch and the packet dma controller, where the packet DMA will transfer the packet to the transmit queue for the gigabit ethernet switch subsystem. From there, the packet will be automatically transferred back to the network coprocessor, through the packet dma controller, and the packet streaming switch, to the gigabit ethernet switch subsystem, for transmission of the network. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 36

37 PA TX Hardware Processing Example
Step 3: The PKTDMA will transfer the packet from the GbE switch transmit queue through the PKTDMA controller and the packet streaming switch in the NETCP to the GbE switch subsystem for transmission over the network. Q640: PDSP0 Q641: PDSP1 Q642: PDSP2 Q643: PDSP3 Q644: PDSP4 Q645: PDSP5 In step 3, the packet dma transfers the packet from transmit queue for the gigabit ethernet switch to the packet DMA controller in the NETCP. From there the packet will be transferred over the packet streaming switch, to the gigabit ethernet switch subsystem for transmission over the Ethernet. It is important to note that the data transfer from the modify/multi-route engine 0 to the queue for the gigabit ethernet switch, and the transfer from the transmit queue for the gigabit ethernet switch to the gigabit ethernet switch subsystem happens without any interaction from the host. Q646: SA0 Q647: SA1 Q648: GbE SW Q900: RXQUEUE 37

38 For More Information For more information, refer to the following KeyStone device documents: Network Coprocessor (NETCP) User Guide Packet Accelerator (PA) User Guide For questions regarding topics covered in this training, visit the support forums at the TI E2E Community website. Thank you for completing the network coprocessor – packet accelerator keystone training. For more information about the packet accelerator, please see the resources referenced on this slide.

39 Back-Up

40 Management Data IO The MDIO module manages up to 32 physical layer (PHY) devices connected to the Ethernet Media Access Controller (EMAC). The MDIO module allows almost transparent operation of the MDIO interface with little maintenance from the DSP. The MDIO module enumerates all PHY devices in the system by continuously polling 32 MDIO addresses. Once it detects a PHY device, the MDIO module reads the PHY status register to monitor the PHY link state. The MDIO module stores link change events that can interrupt the CPU. The event storage allows the DSP to poll the link status of the PHY device without continuously performing MDIO module accesses. When the system must access the MDIO module for configuration and negotiation, the MDIO module performs the MDIO read or write operation independent of the DSP. This independent operation allows the DSP to poll for completion or interrupt the CPU once the operation has completed.

41 Network Layers Network layers define a hierarchy of services delineated by functionality. Each layer can use the functionality of the next layer below, and offers services to the next layer above. The packet accelerator sub-system examines and routes packets based on fields in up to three layers of the Ethernet packets or L0-L2 header of the SRIO packets. In layer 2, the MAC (Media Access Control) layer, the sub-system classifies IEEE packets based on (optionally) the destination MAC, source MAC, Ethertype, and VLAN tags. In Layer 3, the network layer, IPv4 (Internet Protocol Version 4) and IPv6 (Internet Protocol Version 6) packets are routed based (optionally) on source IP address, destination IP address, IPv4 protocol, IPv6 next header, IPv4 Type of Service (recently changed to IPv4 differentiated service in RFC 2474), IPv6 traffic class, and IPv6 flow label. For IP packets with security services the SPI (Security Parameters Index) is also included in the classification information. For IP packets with SCTP (Stream Control Transmission Protocol) the SCTP destination port is also included in the classification information. In layer 4, the transport layer, UDP (User Datagram Protocol) and TCP (Transmission Control Protocol) packets are routed based on the destination port. However, the GTP-U (GPRS Tunnelling Protocol User Plane) over UDP packets are routed based on its 32-bit TEID (Tunnel ID). For SRIO (Serial RapidIO), L0-L2 header information the sub-system classifies SRIO packets based on (optional) the source ID, destination ID, transport type, priority, message type, SRIO type 11 mailbox and letter, SRIO type 9 stream ID and class of service.

42 Receive Path All received packets from Ethernet and/or SRIO are routed to PDSP0 PDSP0 does L0-L2 (MAC/SRIO) lookup using LUT1-0. If the packet is IP, it is forwarded to PDSP1 PDSP1 does the outer IP or Custom LUT1 lookup using LUT1-1 PDSP2 does any subsequent IP or Custom LUT1 lookup using LUT1-2 PDSP3 does all TCP/UDP and Custom LUT2 lookup using LUT2 PDSP4 is used for post-lookup processes such as checksum/CRC result verification. PDSP4/5 can be used for pre-transmission operation such as transmit checksum generation.

43 Receive path - Continue
With the exception of some initial setup functions, the module does not communicate directly with the sub-system. The output of the module is a formatted data block along with a destination address. The module user must send the formatted data to the sub-system. This is typically done by linking the created data block to a host packet descriptor, and then using the addressing information to send the created packet to the sub-system through the queue manager and PKTDMA.

44 Tx path For packets to the network, the sub-system provides ones complement checksum or CRC generation over a range provided by the module user. The range is not determined by sub-system by parsing the to-network packet, since it is assumed that the creator of the packet already has the start offset, length, initial checksum value and etc.

45 Tables look-up The low level driver maintains two tables of layer 2 and layer 3 configuration information. The memory for these tables is provided by the module user at run time. The module maintains ownership of these tables and the module user must not write to the memory once provided to the module. In multi-core devices the module can be used in two different configurations. In independent core mode each core in a device has a unique set of tables. Although it is legal for any core to reference handles from other cores, this is not typically done. In this case cache coherency and cross core semaphores are not implemented by the module user. In common core mode there is only one set of tables and they are shared by all cores. Each core that uses the module must initialize it, but each core will provide the exact same buffers to the module. The module user will have the first core to initialize the module also initialize the table. Other cores will initialize their internal state but not initalize the table. In this mode cache coherency and cross core semaphores must be implemented by the module user to insure the integrity of the tables.


Download ppt "Network Coprocessor (NETCP) Packet Accelerator (PA)"

Similar presentations


Ads by Google