Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture Author: Jahangir Hasan, Srihari Cadambi, Venkatta Jakkula Srimat.
1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date:
1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.
Author: Kang Li, Francis Chang, Wu-chang Feng Publisher: INCON 2003 Presenter: Yun-Yan Chang Date:2010/11/03 1.
1 MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup Author: Oğuzhan ERDEM Cüneyt F. BAZLAMAÇCI Publisher: ISCIS 2009 Presenter:
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
IP Address Lookup Masoud Sabaei Assistant professor
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
Compact Trie Forest: Scalable architecture for IP Lookup on FPGAs Author: O˘guzhan Erdem, Aydin Carus and Hoang Le Publisher: ReConFig 2012 Presenter:
IP Address Lookup Masoud Sabaei Assistant professor
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Author: Heeyeol Yu and Rabi Mahapatra
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: 2012 IEEE/ACM.
Page Table Implementation. Readings r Silbershatz et al:
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
1 Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: IEEE/ACM.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
COSC3330 Computer Architecture
CSC 4250 Computer Architectures
Basic Performance Parameters in Computer Architecture:
Cache Memory Presentation I
TLC: A Tag-less Cache for reducing dynamic first level Cache Energy
SPEAKER: Yu-Shan Chou ADVISOR: DR. Kai-Wei Ke
Sarah Diesburg Operating Systems CS 3430
Worst-Case TCAM Rule Expansion
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Sarah Diesburg Operating Systems COP 4610
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Presentation transcript:

Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06

  Introduction  Background and Related Work  IP Lookup Schemes  New IP Cache Architecture  IP Cache Organization  Bit Selection and Hash Implementation  Progressive Cache Replacement Policy  Experimental Results  IP Cache Performance  Comparison with other IP lookup schemes Outline

  Since a fast packet forwarding constitutes a router’s critical data path, several schemes have been developed based on three major techniques: 1.Ternary Content Addressable Memory(TCAM). 2.Trie-based scheme. 3.Hash-based scheme.  IP traces exhibit strong duplication of destination addresses in the form of temporal locality. Introduction

  The three representative IP lookup schemes, namely trie, hash, and TCAM have different lookup complexities of O(W), O(1), and 1, respectively, in terms of the clock cycle.  It is noted that since TCAMs inherently suffer from a prohibitively large power consumption, researchers propose a hash-based scheme using on-chip Bloom filters. Background and Related Work IP Lookup Schemes

  In our example, we partition W bits into groups of 2 bits and there are W/2 Bloom filters with each designated to a group. Background and Related Work IP Lookup Schemes *For example: A packet with IP address Bloom filter typically use 2Mbits on-chip. An IP cache of the same size can provide a fast on-chip IP lookup solution since the IP cache miss ratio is small.

  We study the performance of various hash functions and observe that 2-Universal hash achieves the best performance.  Therefore, we split our IP cache into two cache banks, each indexed by a separate Universal hash function. New IP Cache Architecture IP Cache Organization

  Let A = { 0, 1,…, 2i-1 } and B = { 0, 1,…, 2j-1 }, where i is the number of bits in the key space, i.e. IP address bits, and j is the number of bits in the cache set space.  Q denotes the set of all i × j boolean matrices.  For a given q ∈ Q and x ∈ A, let q(k) be the k th row of the matrix Q and x k the k th bit of x.  Then, one of H 3 hash functions, h q (x ): A → B is defined as h q (x) = x 1 . q(1) ⊕ x 2 . q(2) ⊕ … ⊕ x i . q( i ) (1) New IP Cache Architecture Bit Selection and Hash Implementation

  Fig. 4 shows a circuit implementation example of class hash function of i=3 and j=2. For a hash function in our IP cache, i=32 since IP address bits are 32, and j=log(# of cache sets). New IP Cache Architecture Bit Selection and Hash Implementation ※ Q denotes the set of all i × j boolean matrices. ※ For a given q ∈ Q, let q(k) be the k th row of the matrix Q.

  We measure the average values of the bits distributed in IP addresses and show them in Fig. 5(a) (the first bit is the MSB). The best key bits (or the important bits) should be those with an average value of 0.5; meaning that they are set 50% of the time over a large series of IP addresses. We notice that bits between the range of 8 to 24 are more important than the rest and thus we choose these 16 bits as our key bits. New IP Cache Architecture Bit Selection and Hash Implementation

  We ignore 8 least significant bits and compare the miss ratios of n least significant bits as key bits in Fig. 5(b), where all miss ratios are normalized to the miss ratio of 32-bit hash. From the figure, we can claim that our tailored key bits can achieve the same performance as 32-bit hash but with the least hardware complexity. New IP Cache Architecture Bit Selection and Hash Implementation

 1.LRU is unable to consider the characteristic of highly skewed flow popularity. 2.In addition, it is difficult to implement LRU over multiple cache banks at a reasonable hardware cost. Add a timestamp to each cache line. New IP Cache Architecture Progressive Cache Replacement Policy

  Once a cache line is accessed, the timestamp is increased by 1 until it is saturated.  When a miss occurs, the cache line with the least value of timestamp is replaced.  In our experiments we notice that 8 bits for the timestamp are needed for our IP cache in order to achieve comparable cache performance and thus this increases the storage overhead by 22%. New IP Cache Architecture Progressive Cache Replacement Policy

  We add one extra bit to each cache line and mark the bit once it is reused.  we count the number of unpopular flows in two corresponding cache sets and use it as a metric when we choose a cache set for the new incoming flow.  The cache set with the larger number of unpopular flows is selected; in case of a tie, we choose the left bank for simplicity. New IP Cache Architecture Progressive Cache Replacement Policy

  Inside each cache set, we replace the cache line in the LRU position by considering the recency, but insert the new flow into the LRU position instead of the MRU position in LRU.  Once a cache line is accessed, the policy progressively moves up the cache line by swapping the line with the upper cache line until it reaches up to the MRU position. New IP Cache Architecture Progressive Cache Replacement Policy

  Given an access sequence of IP1, IP2, IP1, IP3. New IP Cache Architecture Progressive Cache Replacement Policy

 Experimental Results IP Cache Performance

 1.Lookup throughput comparison with hash-based schemes  Define average mean access time (AMAT) as follows: AMAT = Hit Time + Miss Rate × Miss Penalty  on-chip cache access takes 2 clock cycles  off-chip memory access takes 100 clock cycles. Experimental Results Comparison with other IP lookup schemes

 

 2.Power comparison with TCAM-based schemes  Define average mean lookup power (AMLP) as follows: AMLP = Cache Power + Miss Rate × TCAM Power Experimental Results Comparison with other IP lookup schemes

  TCAM   y/ txt y/ txt  2-Universal Hash(P7)  -F07-Lect15.pdf -F07-Lect15.pdf references