Presentation is loading. Please wait.

Presentation is loading. Please wait.

IP Lookup: Some subtle concurrency issues

Similar presentations


Presentation on theme: "IP Lookup: Some subtle concurrency issues"— Presentation transcript:

1 IP Lookup: Some subtle concurrency issues
Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2010

2 but first a correction from the last lecture …
February 22, 2010

3 One-Element Pipeline FIFO
!empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule This actually won’t work! This works correctly in both cases (fifo full and fifo empty). first < enq deq < enq enq < clear deq < clear February 22, 2010

4 One-Element Pipeline FIFO Analysis
!empty !full rdy enab enq deq module FIFO or module mkPipelineFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); ... Rwire allows us to create a combinational path between enq and deq but does not affect the conflict analysis Conflict analysis: Rwire deqEN allows concurrent execution of enq & deq with the functionality deq<enq; However, the conflicts around Register full remain! February 22, 2010

5 Solution- Config registers Lie a little
ConfigReg is a Register (Reg#(a)) Reg#(t) full <- mkConfigRegU; Same HW as Register, but the definition says read and write can happen in either order However, just like a HW register, a read after a write gets the old value Primarily used to fool the compiler analysis to do the right thing February 22, 2010

6 One-Element Pipeline FIFO A correct solution
!empty !full rdy enab enq deq module FIFO or module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule No conflicts around full: when both enq and deq happen; if we want deq < enq then full must be set to True in case enq occurs. Scheduling constraint on deqEn forces deq < enq first < enq deq < enq enq < clear deq < clear February 22, 2010

7 An aside Unsafe modules
Bluespec allows you to import Verilog modules by identifying wires that correspond to methods Such modules can be made safe either by asserting the correct scheduling properties of the methods or by wrapping the unsafe modules in appropriate Bluespec code Config Reg is an example of an unsafe module February 22, 2010

8 back to today’s lecture …
February 22, 2010

9 IP Lookup block in a router
Queue Manager Packet Processor Exit functions Control Processor Line Card (LC) IP Lookup SRAM (lookup table) Arbitration Switch LC The design example you’ll be seeing is from an IP switch. We’ll be looking at the IP lookup function that you typically find on the ingress port of a line card. A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate  15Mpps for 10GE February 22, 2010

10 Sparse tree representation
F A 7 A B F F * E 5.*.*.* D C * B A 7.14.*.* 14 3 5 E F F 7 F 10 F F 18 255 F F IP address Result M Ref F E C C 5 D 200 2 F 3 4 A In this lecture: Level 1: 16 bits Level 2: 8 bits Level 3: 8 bits  1 to 3 memory accesses 1 4 February 22, 2010

11 “C” version of LPM int lpm (IPA ipa) /* 3 memory lookups */ { int p; /* Level 1: 16 bits */ p = RAM [ipa[31:16]]; if (isLeaf(p)) return value(p); /* Level 2: 8 bits */ p = RAM [ptr(p) + ipa [15:8]]; /* Level 3: 8 bits */ p = RAM [ptr(p) + ipa [7:0]]; return value(p); /* must be a leaf */ } 216 -1 28 -1 Not obvious from the C code how to deal with memory latency - pipelining Here’s a longest prefix match algorithm in software. It’s pretty simple, up to three lookups of different parts of the IP address. (Click now) Can you visualize how to implement this in HW? It’s not obvious. Memory latency ~30ns to 40ns Must process a packet every 1/15 ms or 67 ns Must sustain 3 memory dependent lookups in 67 ns February 22, 2010

12 Longest Prefix Match for IP lookup: 3 possible implementation architectures
Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control So, let’s look at three different approaches: Rigid pipeline Linear pipeline Circular pipeline They all appear at a high-level to have different advantages, but… 1 2 3 Designer’s Ranking: Which is “best”? February 22, 2010 Arvind, Nikhil, Rosenband & Dave ICCAD 2004

13 Circular pipeline enter? done? RAM yes inQ fifo no outQ The fifo holds the request while the memory access is in progress The architecture has been simplified for the sake of the lecture. Otherwise, a “completion buffer” has to be added at the exit to make sure that packets leave in order. Next lecture February 22, 2010

14 FIFO interface FIFO#(type t);
method Action enq(t x); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item endinterface n enab enq rdy not full n = # of bits needed to represent a value of type t enab rdy deq module FIFO not empty n first not empty rdy February 22, 2010

15 Request-Response Interface for Synchronous Memory
Data Ack Ready req deq peek Addr Ready ctr (ctr > 0) ctr++ ctr-- deq Enable enq Synch Mem Latency N interface Mem#(type addrT, type dataT); method Action req(addrT x); method Action deq(); method dataT peek(); endinterface Making a synchronous component latency- insensitive February 22, 2010

16 Circular Pipeline Code
enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule done? Is the same as isLeaf rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can enter fire? inQ has an element and ram & fifo each has space February 22, 2010

17 Circular Pipeline Code: discussion
enter? done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule When can recirculate fire? ram & fifo each has an element and ram, fifo & outQ each has space Is this possible? February 22, 2010

18 Ordinary FIFO won’t work but a pipeline FIFO would
February 22, 2010

19 Problem solved! PipelineFIFO fifo <- mkPipelineFIFO; // use a Pipeline fifo rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule RWire has been safely encapsulated inside the Pipeline FIFO – users of the fifo need not be aware of RWires February 22, 2010

20 Dead cycles rule enter (True); IP ip = inQ.first();
done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule assume simultaneous enq & deq is allowed rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule Can a new request enter the system when an old one is leaving? Is this worth worrying about? February 22, 2010

21 The Effect of Dead Cycles
enter done? RAM yes in fifo no Circular Pipeline RAM takes several cycles to respond to a request Each IP request generates 1-3 RAM requests FIFO entries hold base pointer for next lookup and unprocessed part of the IP address What is the performance loss if “exit” and “enter” don’t ever happen in the same cycle? >33% slowdown! Unacceptable February 22, 2010

22 Scheduling conflicting rules
When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations (* descending_urgency = “recirculate, enter” *) February 22, 2010

23 So is there a dead cycle? rule enter (True); IP ip = inQ.first();
done? RAM inQ fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict! February 22, 2010

24 Rule Spliting rule foo (True); if (p) r1 <= 5; else r2 <= 7; endrule rule fooT (p); r1 <= 5; endrule rule fooF (!p); r2 <= 7; rule fooT and fooF can be scheduled independently with some other rule February 22, 2010

25 Spliting the recirculate rule
rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule Now rules enter and exit can be scheduled simultaneously, assuming fifo.enq and fifo.deq can be done simultaneously February 22, 2010

26 Packaging a module: Turning a rule into a method
inQ outQ enter? RAM done? fifo rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(p[15:0]); inQ.deq(); endrule method Action enter (IP ip); ram.req(ip[31:16]); fifo.enq(ip[15:0]); endmethod Similarly a method can be written to extract elements from the outQ next lecture- Completion Buffer February 22, 2010


Download ppt "IP Lookup: Some subtle concurrency issues"

Similar presentations


Ads by Google