Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang.

Similar presentations


Presentation on theme: "A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang."— Presentation transcript:

1 A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang Xu, Bin Liu, Hongbin Lu Department of Computer Science, Tsinghua University, Beijing, P.R.China

2 2 Background & Problem Network Processor (NP)  A special purpose, programmable hardware device that combines the flexibility of a RISC processor with the speed of ASIC. They are building blocks used to construct network systems  Data path: Processing Engine (PE) Two Design Goals  High Speed: Multiple PEs packet level parallelism  High Flexibility: Versatile processing requirements unpredictable processing time for each packet The Packet Disordering (PD) Problem  Packets depart in a different order from their arrival  Network performance may be deteriorated greatly

3 3 Objective To design a practical mechanism which can preserve packet order in NP, at the same time to ensure the utilization of PE NP Model DispatcherAggregator PE0 PEn

4 4 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for all the flow sequence information  The working process Simulation Contents

5 5 Design Issues(1) the scope of packets to preserve order Global-scope  All packets leave strictly in order Within-flow-scope  Only packets of the same flow leave in order Processing delay of different flows is different in NP within-flow-order-preserving is quite necessary

6 6 Design Issues(2) Where order scheduling is taking place? Order scheduling location  Pre-processing scheduling SPSL--Sequential Processing Sequential Leaving  Post-processing scheduling UPSL-- Un-sequential Processing Sequential Leaving

7 7 Design Issues(2) the shortcoming of pre-processing scheduling? DispatcherAggregator PE0 PEn Packet Buffer

8 8 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for sequence information  The working process Simulation Contents

9 9 in traditional network devices  Sequence number or timestamp  global order preserving in our NP  Packet chains  Providing ability to discriminate among flows The Proposed Solution (1) The methods of reordering

10 10 The Proposed Solution (2) NP System with Packet Data Buffer Dispatcher Aggregator PE Complex Packet Data Buffer b1b2bk p2pnp1 t1 t2 tm packet thread block blocks disordered packets buffering

11 11 The Proposed Solution (3) Packet in the NP block in the Packet Data Memory When to transmit packet in which block? How to discriminate among flows?

12 12 The Proposed Solution (4) using packet chains to record sequence information FlowID End Packet Head Packet f(j) f(1) Flow Table p(d)p(x)p(y)p(z) p(a)p(b) p(e) Block Table FlowID packet Dispatcher

13 13 The Proposed Solution (5) The Working Process b1 b2 b3 b4 b5 b6 b7 b8 f f 3 1 2 f f r f05 f f 45 flowID head end 3 7 4 1 r f 5 5 1 3 2 f 72 PE 1 PE 2 Packet Data Buffer

14 14 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for sequence information  The working process Simulation Contents

15 15 throughput vs flow number, for three traces Simulation throughput vs flow number, for three traces Trace1 f1: 100% constant 40bytes f2: 0% Trace2 f1: 95% constant 40bytes f2: 5% random from 40 to 60bytes Trace3 f1: 90% constant 40bytes f2: 10% random from 80 to 120bytes A NP system with 4 PEs, each 4 threads and totally 8*4=32 memory blocks f2 is length related app. packets Where f1 is length unrelated app. Packets,

16 16 Utilization of 4 PEs under traces with the fewest flows Simulation Utilization of 4 PEs under traces with the fewest flows Time fraction of active thread number for each PE trace1 trace2 trace3 All traces have the fewest flows

17 17 Buffer Occupation with the fewest flows Simulation Buffer Occupation with the fewest flows Time fraction of free block number for each PE trace1 trace2 trace3 All traces have the fewest flows

18 18 A Summary A solution to preserve packet order in NP with multiple PE for data plane processing Packet chains to record sequence information of different flows to preserve packet order Reordering within-flow-scope is quite necessary in NP Future work: optimize Memory block and PE resources

19 19 Thank you for your attention!


Download ppt "A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang."

Similar presentations


Ads by Google