Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)

Similar presentations


Presentation on theme: "Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)"— Presentation transcript:

1 Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
Low Power, High Throughput Internet Routing Lookup Tables using SRAM Tries Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)

2 University of Alberta (Confidential)
Overview Internet Routing Previous Work Software Solutions TCAM Solutions Trie Solutions Proposed Solution Research Steps Design Routing Table Lookup Process Addition Process Removal Process Arbiter I/O Signals Research Results 12/4/2018 University of Alberta (Confidential)

3 University of Alberta (Confidential)
Internet Addresses Internet connects hundreds of millions of computers Each computer assigned a unique 4 byte (32 bit) IP address Ex: Computers on the same network are grouped by a common address prefix Ex: 12 computers: through Prefixes range from 8 to 32 bits in length 12/4/2018 University of Alberta (Confidential)

4 University of Alberta (Confidential)
Internet Routing Computers send data packets with a destination IP address in their headers to their local routers Packets are forwarded from router to router until they arrive at their destination Routers use lookup tables to determine which port to forward each packet through Routing lookup tables are often modified to reflect the evolving Internet topography 12/4/2018 University of Alberta (Confidential)

5 Internet Routing (cont.)
12/4/2018 University of Alberta (Confidential)

6 Internet Routing (cont.)
Local Internet Service Provider (ISP) Router Tens of lookup table entries Processes thousands of packets per second “Backbone” ISP Router Hundreds of thousands of lookup table entries Processes millions of packets per second Tens of updates per second Challenge to design high capacity and throughput, yet low latency and power routers 12/4/2018 University of Alberta (Confidential)

7 Internet Routing (cont.)
Lookup tables contain address prefix entries *.* => Port 12 *.* => Port 14 * => Port 7 Longest matching prefix given precedence routed to Port 7, not Port 12 Entries often added or removed Lookup tables remain the bottleneck in router performance and power consumption 12/4/2018 University of Alberta (Confidential)

8 University of Alberta (Confidential)
Software Solutions Capable of being implemented on generic or slightly modified commodity hardware Serial execution on CPU with memory Utilize advanced data structures and algorithms Focus on minimizing Asymptotic running time Memory accesses Memory size 12/4/2018 University of Alberta (Confidential)

9 Software Solutions (cont.)
Advantages Flexible and easily implemented Low unit cost due to commodity hardware and reduced memory requirements Disadvantages Limited by CPU and memory performance Inherently power and area inefficient Difficult to parallelize or pipeline Often complicated and costly update procedures 12/4/2018 University of Alberta (Confidential)

10 University of Alberta (Confidential)
TCAM Solutions Utilize a Ternary Content Addressable Memory (TCAM) to store address prefixes Each cell stores either a 0, 1 or * (don’t care) Searches conducted in parallel on all entries Out of all entries that match the one whose prefix is the longest is selected (priority encoding) Index of selected entry used to lookup routing information stored in a separate memory 12/4/2018 University of Alberta (Confidential)

11 University of Alberta (Confidential)
TCAM Solutions (cont.) 12/4/2018 University of Alberta (Confidential)

12 University of Alberta (Confidential)
TCAM Solutions (cont.) Advantages Flexible cells well suited to pattern matching CAM array can be pipelined to increase throughput and save power Disadvantages Parallel searching consumes large amounts of power Complex cell is larger and more capacitive, slowing down searches and consuming more power CAM array cannot support multiple lookups in parallel 12/4/2018 University of Alberta (Confidential)

13 University of Alberta (Confidential)
Trie Solutions Multi-bit trie data structure divides an IP address into several parts, called strides Ex: 8 bit IP address with strides of 4 bits, 2 bits and 2 bits would be First stride of address used to index a memory Memory entry stores either: The port number the packet should be routed to A pointer to a memory the next stride should index In this case the process is repeated until the port number is uniquely determined 12/4/2018 University of Alberta (Confidential)

14 University of Alberta (Confidential)
Trie Solutions (cont.) Lookup : In stride 1, entry for 0110 in bank 1 is a pointer to bank 1 In stride 2, entry for 11 in bank 1 is a pointer to bank 1 In stride 3, entry for 11 is port number 2 Prefix Port 0* 011* * * 1 2 3 4 1* 1001* * 110* 11011* 5 6 7 8 9 12/4/2018 University of Alberta (Confidential)

15 University of Alberta (Confidential)
Trie Solutions (cont.) Advantages Lookups involve fast, low power memory accesses Can be used for software and hardware solutions Software solutions are more flexible Hardware solutions can be pipelined Disadvantages Poor choice of strides results in high memory usage Updates can involve changing a lot of entries Doesn’t support multiple lookups in parallel 12/4/2018 University of Alberta (Confidential)

16 University of Alberta (Confidential)
Proposed Solution Implement multi-bit trie in hardware using VLSI Implement each bank as its own SRAM on chip Each memory access is to a small, fast SRAM Rest can be disabled to save power Lookups at each stride can be pipelined Process several lookups in parallel Duplicate the first stride memory and reuse the memories in the other strides Bound the amount of time updates can take Add default (*) entry to each memory 12/4/2018 University of Alberta (Confidential)

17 Proposed Solution (cont.)
Lookup : Port 0 Lookup : Port 2 Lookup : Port 8 If three copies of stride 1 then all three lookups can be done in parallel Prefix Port 0* 011* * * 1 2 3 4 1* 1001* * 110* 11011* 5 6 7 8 9 12/4/2018 University of Alberta (Confidential)

18 University of Alberta (Confidential)
Research Steps Design proposed solution generically in VHDL Make no assumptions about strides, banks, etc. Functionally verify design through simulation Implement design and interface in a FPGA Test to verify design works in hardware Compare results against C++ simulator Use C++ program to analyze real backbone routing tables to compare various stride choices Use SRAM compiler and logic synthesizer to implement most promising choices as ASICs Select best design through simulation 12/4/2018 University of Alberta (Confidential)

19 University of Alberta (Confidential)
Design: Lookup Bank SRAM is addressed by appropriate IP address stride and returns an entry If port number & prefix length then that is the port number to route the packet to Prefix length is used for updates only If bank pointer then that bank must be accessed similarly in the next stride Register stores a default port number & prefix length for the bank Returned along with SRAM entry Used as port number if entry has the special port number “default” Bounds update time 12/4/2018 University of Alberta (Confidential)

20 University of Alberta (Confidential)
Design: Lookup Node Each lookup bank must be accessible by each of the “L” lookup agents Address and Enable signals from each agent are combined At most one lookup agent will access a bank at a time Bank is disabled if no lookup agent needs it Single update agent‘s signals aren’t shown for clarity 12/4/2018 University of Alberta (Confidential)

21 Design Aside: Bussing Signals
Multiplexer Tree Tristate Bus Crossbar Ideal for addresses Simple inputs Complex leaves Complex branches Logarithmic scaling Usable in FPGA Ideal for enables Complex inputs No branches Linear scaling Unusable in FPGA Simple leaves Simple branches 12/4/2018 University of Alberta (Confidential)

22 University of Alberta (Confidential)
Design: Lookup Bus Each lookup agent supplies signals to the bus Enable: Indicates if the agent is using the bus Bank: The number of the bank to access Address: The entry in the bank to access Enable demultiplexed using the bank number to provide local enable for each of the “B” banks Bank number is registered to multiplex the correct resolved and default data back to the lookup agent on the following clock cycle Advantages and disadvantages to using decoder and crossbars for this instead 12/4/2018 University of Alberta (Confidential)

23 Design: Lookup Bus (cont.)
12/4/2018 University of Alberta (Confidential)

24 University of Alberta (Confidential)
Design: Routing Stage Each lookup agent inputs and outputs signals Enable: If the agent is activated this stage Performed: If the agent performed a lookup IpAddress: The address being looked up Port: The port to forward the packet to, if known DefaultPort: If Port ends up as unknown, use this An agent continues the lookup of the previous stage (if any) and passes it to the next stage If required the agent accesses the lookup bus One lookup agent also handles all updates 12/4/2018 University of Alberta (Confidential)

25 Design: Routing Stage (cont.)
12/4/2018 University of Alberta (Confidential)

26 Design: First Routing Stage
First stage has only one bank This bank is replicated so each agent has its own copy Agents don’t need to supply a bank number Agents always perform a lookup if enabled Only need Enable and IP Address inputs Update agent applies all changes to all banks in parallel 12/4/2018 University of Alberta (Confidential)

27 University of Alberta (Confidential)
Design: Routing Table Routing table combines “N” routing stages together Result logic transforms last stage output LookupOut: Indicates if a lookup result is being output IpAddressOut: The IP address resolved PortOut: The port number the packet should be routed to 12/4/2018 University of Alberta (Confidential)

28 Design: Lookup Process
12/4/2018 University of Alberta (Confidential)

29 Design: Lookup Example 1
12/4/2018 University of Alberta (Confidential)

30 Design: Lookup Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

31 Design: Lookup Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

32 Design: Lookup Example 2
12/4/2018 University of Alberta (Confidential)

33 Design: Lookup Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

34 Design: Lookup Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

35 University of Alberta (Confidential)
Design: Updates Updates can be prefix additions or removals Where as lookups are pipelined and done in parallel, only one update is processed at a time No new lookups are permitted while an update is in progress, but existing lookups will finish Updates can take many clock cycles to complete (but are bounded) and the routing table favors fast lookups over fast updates Update logic should not lower lookup speed Updates are infrequent compared to lookups Many simple states instead of a few complex ones 12/4/2018 University of Alberta (Confidential)

36 University of Alberta (Confidential)
Design: Update Agent One lookup agent per stage is also an update agent that has write access to all banks Like lookup agents, update agents receive partially processed updates from the previous stage, do their own processing, and pass the update on to the next stage Update agents handle updates by reading and writing their own banks, as well as issuing commands and obtaining results from neighbouring stages 12/4/2018 University of Alberta (Confidential)

37 Design: Addition Process
12/4/2018 University of Alberta (Confidential)

38 Design: Addition Example 1
12/4/2018 University of Alberta (Confidential)

39 Design: Addition Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

40 Design: Addition Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

41 Design: Addition Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

42 Design: Addition Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

43 Design: Addition Example 2
12/4/2018 University of Alberta (Confidential)

44 Design: Addition Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

45 Design: Addition Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

46 Design: Removal Process
12/4/2018 University of Alberta (Confidential)

47 Design: Removal Example 1
12/4/2018 University of Alberta (Confidential)

48 Design: Removal Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

49 Design: Removal Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

50 Design: Removal Example 1 (cont.)
12/4/2018 University of Alberta (Confidential)

51 Design: Removal Example 2
12/4/2018 University of Alberta (Confidential)

52 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

53 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

54 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

55 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

56 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

57 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

58 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

59 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

60 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

61 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

62 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

63 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

64 Design: Removal Example 2 (cont.)
12/4/2018 University of Alberta (Confidential)

65 Design: Worst Case Updates
Addition or removal involves several steps Navigate to the target bank Find replacement entry if removal Modify each entry to reflect the change Port: Modify value Pointer: Follow it to target bank and modify its default Deallocate banks if removal and if possible First and last steps linear work with respect to the number of stages 12/4/2018 University of Alberta (Confidential)

66 Design: Worst Case Updates (cont.)
Middle steps linear work with respect to number of memory entries affected; up to half the total Replacement: Search at most half the bank Modify: Prefix can cover at most half the bank At most 32 stages, where as up to 232 entries For most designs largest stride determines worst case update time Important to consider when determining strides Remember that Internet prefixes are 8-32 bits long Prefix covers at most 1/28=1/256 of the first stage bank 224 worst case modifications for 1 stride of 32 bits 12/4/2018 University of Alberta (Confidential)

67 University of Alberta (Confidential)
Design: Arbiter Ensures that no lookup agents access the same bank at the same time First stage bank replicated, so conflicts only possible in subsequent stages To access the same bank two queries must therefore have the same first stride bits, accessing the same first stage entry that’s a pointer Simple solution is to prevent two lookups with the same first stride bits from executing in parallel 12/4/2018 University of Alberta (Confidential)

68 Design: Arbiter (cont.)
Assign lookups to agents in series If all agents occupied then wait for next cycle If lookup conflicts with another agent’s lookup then wait for next cycle Easy to implement and only minimal overhead, but conflicts reduce throughput Assuming random data, conflicts are inversely exponentially proportional to the first stride size Important to consider when determining strides More complicated schemes exist but have the same worst case performance 12/4/2018 University of Alberta (Confidential)

69 University of Alberta (Confidential)
Design: I/O Signals Chip clocked at SRAM access speeds “L” agents means chip must handle L 32 bit IP addresses input every clock cycle as lookup queries L 32 bit IP addresses and L ~6 bit port numbers output every clock cycle as lookup results Increasing lookup parallelism quickly exhausts available chip I/O pins High speed I/O solutions required for this design 12/4/2018 University of Alberta (Confidential)

70 University of Alberta (Confidential)
Research Results Design fully implemented in VHDL Design demonstrates full, correct functionality in functional simulation FPGA implementation, including arbiter and additional interfacing operates correctly 4 lookups every 40 MHz in Vertex-II Pro Limited memory of FPGA limits size of table Logic only 14% utilized Optimal stride analysis nearly complete ASIC simulation estimates awaiting completing of SRAM memory compiler 12/4/2018 University of Alberta (Confidential)

71 University of Alberta (Confidential)
Questions??? 12/4/2018 University of Alberta (Confidential)


Download ppt "Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)"

Similar presentations


Ads by Google