Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM 2011

Contents Introduction GALE functionalities –Lookup –Update Evaluation Conclusion

Introduction A core functionality of a router is –To determine the next hop port Router Routing Table IP address Next Hop

Introduction Two challenges –Lookup : large # of queries per time Link speed Routing table size –Update Addition / Deletion of mapping entries Modification of next hop information in existing entries

Introduction Existing solutions –Hardware-based Specialized hardware like TCAM –Software-based Optimization for longest-prefix matching Modification or extension of data structure Software router using GPUs –“Packetshader” –Assumption : routing tables are static

GPU-Accelerated Lookup Engine –Leverage CUDA programming model for parallel lookups of IP routing table –O(1) time complexity for IP lookup : use of direct table on GPU memory –Route update operation using parallelism

GPU-Accelerated Lookup Engine Architecture

GPU-Accelerated Lookup Engine Architecture (Cont’d) –Two tables Traditional trie-based routing table “Direct Table” –Sharing role and control for each table Fast lookup for direct table, which is controlled by GPU and CUDA Update-related task for trie using CPU code

Lookup in GALE IP lookup operation –Far more frequent than update operation Good target for parallelism of Graphic processing –Use of ‘direct table’ Next hop information for all the possible IP prefixes in a single table Direct Translation of IP addr. to memory address O(1) memory access / computational complexity –# of entries 2 32 = 4G possible IP prefix entries 99% of IP prefixes are less than or equal to 24 bits –Then space for 2 24 = 16M prefix entries are required

Lookup in GALE IP lookup operation (cont’d) –Direct table ‘dtable’ is stored in GPU memory –IP prefix will be translated into a single integer ipaddr = a * 2 24 + b * 2 16 + c * 2 8 + d for address ‘a.b.c.d’

Lookup in GALE Example –Sending packet to 147.46.216.111/24 dtable[] index= leftmost 24 bit of ipaddr = 9,645,784 index : 0 index : 16,777,215 Looking for dtable[9,645,784] next hop : eth1 next hop : eth2 next hop : eth4 index : 9,645,784 ipaddr= 147 * 2 24 + 46 * 2 16 + 216 * 2 8 + 111 = 2,469,320,805 Interface to next hop is eth2

Update in GALE Routing table update operation –Insertion of new routing table entry –Modification of next hop for existing entry –Removal of existing entry Trie-specific operations involve trie traversing –Many algorithms exists ‘Direct table’ operations –No allocation/de-allocation of memory space –Single prefix may be represented as set of multiple 24-bit prefixes in the direct table 147.46.0.0/8 → 147.46.0.0/24 ~ 147.46.255.0/24

Update in GALE Direct table operations : Insertion / Modification –Two are the same operation To write the new next-hop information to the corresponding IP prefix(es) ltable stores prefix length for the index

Update in GALE Direct table operations : Deletion –Next-hop information is replaced with next- hop information of prefix of parent node The parent node is obtained during the deletion process in the trie structure by CPU

Update in GALE Example –Update next hop for 147.46.216.0/16 as eth1 dtable[] start= 9,645,784 end= 9,645,784 + 256 – 1 (for 8 bits) index : 16,777,215 Update entries from start to end next hop : eth2 next hop : eth4 index : 9,645,784 next hop : eth2 index : 9,646,039 next hop : eth1

Experiments Setup –Routing dataset from FUNET, RIS –Implementation on desktop PC ($1,122) 2.66GHz i5 750 (quad-core) 4GB DDR3 SDRAM NVIDIA GeForce 470 ($428) –1.2GB of global memory –448 stream processors (GPU cores)

Experiments Performance –Insert / Modify / Delete Due to less dependencies in updating the direct table

Experiments Performance –Lookup performance comparison to trie

Conclusion GALE exploits massive parallelism to speedup parallel routing table lookups Direct table can be used for O(1) lookup complexity for routing table Various updates for routing table also can be done with GPU’s parallelism

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Similar presentations

Presentation on theme: "Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.

Similar presentations

Presentation on theme: "Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM."— Presentation transcript:

Similar presentations

About project

Feedback