1 ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
INSTRUCTION SET ARCHITECTURES
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.
Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
Henry Hexmoor1 Chapter 7 Henry Hexmoor Registers and RTL.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
CS335 Networking & Network Administration Tuesday, May 11, 2010.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 19 Binding Protocol Addresses (ARP) Chapter 20 IP Datagrams and Datagram Forwarding.
Chapter 3 Review of Protocols And Packet Formats
ECE 526 – Network Processing Systems Design Packet Processing II: algorithms and data structures Chapter 5: D. E. Comer.
Chapter 9 Classification And Forwarding. Outline.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Computer Networks Layering and Routing Dina Katabi
Network Algorithms, Lecture 3: Exact Lookups George Varghese.
Network Algorithms, Lecture 1: Intro and Principles George Varghese, UCSD.
Paper Review Building a Robust Software-based Router Using Network Processors.
CEN Network Fundamentals Chapter 19 Binding Protocol Addresses (ARP) To insert your company logo on this slide From the Insert Menu Select “Picture”
G64INC Introduction to Network Communications Ho Sooi Hock Internet Protocol.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Protocol Layering Chapter 10. Looked at: Architectural foundations of internetworking Architectural foundations of internetworking Forwarding of datagrams.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
IP Forwarding.
ECE 526 – Network Processing Systems Design Networking: protocols and packet format Chapter 3: D. E. Comer Fall 2008.
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
1. Outline Introduction Related work on packet classification Grouper Performance Analysis Empirical Evaluation Conclusions 2/42.
IP1 The Underlying Technologies. What is inside the Internet? Or What are the key underlying technologies that make it work so successfully? –Packet Switching.
Internet Protocol: Routing IP Datagrams Chapter 8.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
1 Kyung Hee University Chapter 8 ARP(Address Resolution Protocol)
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Protocol Layering Chapter 11.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
ECE 526 – Network Processing Systems Design Network Address Translator.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
ECE 526 – Network Processing Systems Design Network Address Translator II.
CS4432: Database Systems II
Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
REGISTER TRANSFER LANGUAGE (RTL)
Data Link Layer.
IP Routers – internal view
Chapter 8 ARP(Address Resolution Protocol)
Network Core and QoS.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Implementing an OpenFlow Switch on the NetFPGA platform
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Basic Mechanisms How Bits Move.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Ch 17 - Binding Protocol Addresses
Data Link Layer. Position of the data-link layer.
Network Core and QoS.
Presentation transcript:

1 ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3

2 Outline Review Principle 1-7 Implementation principles ─ Reflect what we learned Example: TCAM updating Cautionary Questions

3 Reviews P1: Avoid Obvious Waste ─ Example: copy packet pointer instead of packet P2: Shift Computation in Time ─ precompute (table lookup), ─ evaluate lazily (network forensics) ─ Share Expenses (batch processing) P3: Relax Subsystem Requirements ─ Trade certainty for time (random sampling); ─ Trade accuracy for time (hashing, bloom filter); ─ Shift computation in space (fast path/slow path)

4 Reviews P4: Leverage Off-System Components ─ Examples: Onboard Address Recognition & Filtering, cache P5: Add Hardware to Improve Performance ─ Use memory interleaving, pipelining (= parallelism); ─ Use Wide-word parallelism (save memory accesses) ─ Combine SRAM, DRAM (low-order bits each counter in SRAM for a large number of counters) P6: Replace inefficient general routines with efficient specialized ones ─ Examples: NAT using forwarding and reversing tables P7: Avoid Unnecessary Generality ─ Examples: RISC, microengine

5 P8: Don't be tied to reference implementations Key Concept: ─ Implementations are sometimes given (e.g. by manufacturers) as a way to make the specification of an interface precise, or show how to use a device ─ These do not necessarily show the right way to think about the problem—they are chosen for conceptual clarity! Examples: ─ Using parallel packet classification instead of sequential demultiplexing in TCP/IP protocols

6 P9: Pass hints across interfaces Key Concept: if the caller knows something the callee will have to compute, pass it (or something that makes it easier to compute) as an argument! ─ "hint" = something that makes the recipient's life easier, but may not be correct ─ "tip" = hint that is guaranteed to be correct ─ Caveat: callee must either trust caller, or verify (probably should do both) Example ─ Active message, the message carry the address of interrupt handler for fast dispatching

7 P10: Pass hints in protocol headers Key Concept: If sender knows something receiver will have to compute, pass it in the header Example: ─ Tag switching, packet contains extra information beside the destination address for fast lookup

8 P11: Optimize the Expected Case Key Concept: If 80% of the cases can be handled similarly, optimize for those cases P11a: Use Caches ─ A form of using state to improve performance Example: ─ TCP input "header prediction" If an incoming packet is in order and does what is expected, can process in small number of instructions

9 P12: Add or Exploit State to Gain Speed Key Concept: Remember things to make it easier to compute them later P12a: Compute incrementally ─ Here the idea is to "accumulate" as you go, rather than computing all-at-once at the end Example: ─ Incremental computation of IP checksum

10 P13: Optimize Degrees of Freedom Key Concept: be aware of variables under one’s control and evaluation criteria used determine good performance Example: memory-based string matching algorithm ─ possible transitions from each state for a character is 256 (2^^8, ASCII coding using 8 bit); ─ Bit-split algorithm using 8 machines, each machine only check for one bit, the total possible transitions for a character is 16 (2^^1 * 8)

11 P14: Use special techniques for finite universes (e.g. small integers) Key Concept: when the domain of a function is small, techniques like bucket sorting, bitmaps, etc. become feasible. Example: ─ bucket sorting for NAT table lookup NAT table is very sparse Each bucket is accessed by hashing ─ Bucket sort Partitioning an array into a finite number of bucket Each bucket is sorted individually

12 P15: Use algorithmic techniques to create efficient data structures Key Concept: once P1-P14 have been applied, think about how to build an ingenious data structure that exploits what you know Examples ─ IP forwarding lookups PATRICIA trees (data structure) were first –A special trie, with each edge of patricia tree labled with sequences of characters. Then many other more-efficient approaches

13 TCAM Ternary: 0, 1 and *(wildcard) TCAM: specified length of key and associated actions TCAM lookup: compare the query with all keys in parallel, output (in one cycle) the lowest memory location whose key matches the input IP forward uses longest-prefix matching ─ DIP matches both * and 01* Using TCAM for IP forwarding, requires put all longer prefixes occur before any shorter ones.

14 IP Lookup All prefixes with the same length are group together the shortest prefix 0* are in the highest memory address The packet with DIP: matches prefix of both P3 and P5 P5 is chosen due to longest-prefix matches

15 Routing Table Update 11* with P1 needed to insert to routing table Naïve: create space in group of length-2 prefix, and pushing up one position all prefixes of length-2 and higher Core routing table have 100, 000 entries  100, 000 memory accesses

16 Routing Table Update P13: understand the exploit degrees of freedom -- we can add 11* at any position of group 2, not required after 10*. We can add boundary of group 2 and group 3.

17 Clever Routing Table Updating the maximum memory accesses is 32 – i.

18 Cautionary Questions Q1: Is improvement really needed? Q2: Is this really the bottleneck? Q3: What impact will change have on rest of system? Q4: Does BoE-analysis indicate significant improvement? Q5: Is it worth adding custom hardware? Q6: Can protocol change be avoided? Q7: Do prototypes confirm the initial promise? Q8: Will performance gains be lost if environment changes?

19 Summary P1-P5: System-oriented Principles ─ These recognize/leverage the fact that a system is made up of components ─ Basic idea: move the problem to somebody else’s subsystem P6-P10: Improve efficiency without destroying modularity ─ “Pushing the envelope” of module specifications ─ Basic engineering: system should satisfy spec but not do more P11-P15: Local optimization techniques ─ Speeding up a key routine ─ Apply these after you have looked at the big picture

20 Reminder