Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.

Similar presentations


Presentation on theme: "CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang."— Presentation transcript:

1 CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang

2 About the course ● Prerequisite: CSE 524 or the equivalent ● Implementation-focused course – Intel's IXA network processor platform ● Contents – Brief lecture material on network processors and the IXP – 5 weeks of designed laboratories – 3 weeks of final projects

3 Modern router architectures ● Split into a fast path and a slow path ● Control plane – High-complexity functions – Route table management – Network control and configuration – Exception handling ● Data plane – Low complexity functions – Fast-path forwarding

4 Router functions ● RFC 1812 plus... – Error detection and correction – Traffic measurement and policing – Frame and protocol demultiplexing – Address lookup and packet forwarding – Segmentation, fragmentation, reassembly – Packet classification – Traffic shaping – Timing and scheduling – Queuing – Security

5 Design choices for network products ● General purpose processors ● Embedded RISC processors ● Network processors ● Field-programmable gate arrays (FPGAs) ● Application-specific integrated circuits (ASICs)

6 General purpose processors (GPP) ● Programmable ● Mature development environment ● Typically used to implement control plane ● Too slow to run data plane effectively – Sequential execution – CPU/Network 50x increase over last decade – Memory latencies 2x decrease over last decade ● Gigabit ethernet: 333 nanosecond per packet budget ● Cache miss: ~150-200 nanoseconds

7 Embedded RISC processors (ERP) ● Same as GPP, but – Slower – Cheaper – Smaller (require less board space) – Designed specifically for network applications ● Typically used for control plane functions

8 Application-specific integrated circuits (ASIC) ● Custom hardware ● Long time to market ● Expensive ● Difficult to develop and simulate ● Not programmable ● Not reusable ● But, the fastest of the bunch ● Suitable for data plane

9 Field Programmable Gate Arrays (FPGA) ● Flexible re-programmable hardware ● Less dense and slower than ASICs ● Cheaper than ASICs ● Good for providing fast custom functionality ● Suitable for data plane

10 Network processors ● The speed of ASICs/FPGAs ● The programmability and cost of GPPs/ERPs ● Flexible ● Re-usable components ● Lower cost ● Suitable for data plane

11 Network processors ● Common features – Small, fast, on-chip instruction stores (no caching) – Custom network-specific instruction set programmed at assembler level ● What instructions are needed for NPs? Open question. ● Minimality, Generality – Multiple processing elements – Multiple thread contexts per element – Multiple memory interfaces to mask latency – Fast on-chip memory (headers) and slow off-chip memory (payloads) – No OS, hardware-based scheduling and thread switching

12 Why network processors? ● The propaganda ● Take the current vertical network device market ● Commoditize horizontal slices of it ● PC market – Initially, an IBM custom vertical – Now, a commodity market with Intel providing the chip-set ● Network device market – Draw your own conclusions

13 Network processing approaches Programming/Development Ease Speed ASIC Network processor FPGA GPP Embedded RISC Processor

14 Network processor architectures ● Packet path – Store and forward ● Packet payload completely stored in and forwarded from off-chip memory ● Allows for large packet buffers ● Re-ordering problems with multiple processing elements ● Intel IXP, Motorola C5 – Cut-through ● Packet held in an on-chip FIFO and forwarded through directly ● Small packet buffers ● Built-in packet ordering ● AMCC

15 Network processor architectures ● Processing architecture – Parallel ● Each element independently performs entire processing function ● Packet re-ordering problems ● Larger instruction store needed per element – Pipelined ● Each element performs one part of larger processing function ● Communicates result to next processing element in pipeline ● Smaller code space ● Packet ordering retained ● Deterministic behavior (no memory thrashing) – Hybrid

16 Network processor architectures ● Processing hierarchy – ASICs – Embedded RISC processors – Specialized co-processors – See figure 13.7 in book

17 Network processor architectures ● Memory hierarchy – Small on-chip memory ● Control/Instruction store ● Registers ● Cache ● RAM – Large off-chip memory ● Cache ● Static RAM ● Dynamic RAM

18 Network processor architectures ● Internal interconnect – Bus – Cross-bar – FIFO – Transfer registers

19 Network processor architectures ● Concurrency – Hardware support for multiple thread contexts – Operating system support for multiple thread contexts – Pre-emptiveness – Migration support

20 Increasing network processor performance ● Processing hierarchy – Increase clock speed – Increase elements ● Memory hierarchy – Increase size – Decrease latency – Pipelining – Add hierachies – Add memory bandwidth (parallel stores) – Add functional memory (CAMs)

21 Focus of this class... ● Network processors – Intel IXA

22 IXP 1200 features ● One embedded RISC processor (StrongARM) – Runs control plane (Linux) ● 6 programmable packet processors (  -engines) – Runs data plane (  -engine assembler or  -engine C) ● Central hash unit ● Multiple, bus interconnects – IXBus (4.4Gbps) to overcome PCI's 2.2Gbps limit ● Small on-board memory ● Serial interface for control ● External interfaces for memory

23

24 IXP12xx  -engine

25 IXP2xxx  -engine

26  -engine functions ● Packet ingress from physical layer interface ● Checksum verification ● Header processing and classification ● Packet buffering in memory ● Table lookup and forwarding ● Header modification ● Checksum computation ● Packet egress to physical layer interface

27  -engine characteristics ● Programmable microcontroller – Custom RISC instruction set – Private 2048 instruction store per  -engine (loaded by StrongARM) – 5-stage execution pipeline ● Hardware support for 4 threads and context switching – Each  -engine has 4 hardware contexts (mask memory latency)

28  -engine characteristics ● 128 general purpose registers – Can be partitioned or shared – Absolute or context-relative ● 128 transfer registers – Staging registers for memory transfers – 4 blocks of 32 registers ● SDRAM or SRAM ● Read or Write ● Local Control and Status Registers (CSRs) – USTORE instructions, CTX, etc. (p. 315)

29  -engine characteristics ● FBI unit – Scratchpad memory – Hash unit – FBI CSRs – IXBus control – IXBus FIFOs ● Transmit and Receive FIFOs to external line cards

30  -engine opcodes ● ALU instructions – ALU, ALU_SHF, DBL_SHIFT ● Branch/Jump instructions – BR, BR=0, BR!=0, BR_BSET, BR=BYTE, BR=CTX, BR_INP_STATE, BR_!SIGNAL, JUMP, RTN, etc. ● Reference instructions – CSR, FAST_WR, LOCAL_CSR_RD, R_FIFO_RD, PCI_DMA, SCRATCH, SDRAM, SRAM, T_FIFO_WR, etc. ● Local register instructions – FIND_BST, IMMED, LD_FIELD, LOAD_ADDR, LOAD_BSET_RESULT1, etc.

31  -engine functions ● Miscellaneous – CTX_ARB – NOP – HASH1_48, HASH1_64, etc.

32 1. Packet received on physical interface (MAC) 2. Ready-bus sequencer polls MAC for mpacket Updates receive-ready upon a full mpacket 3.  -engine polls for receive-ready 4.  -engine instructs FBI to move mpacket from MAC to RFIFO 5.  -engine moves mpacket directly from RFIFO to SDRAM 6. Repeat 1-5 until full packet received 7.  -engine or StrongARM processing 8. Packet header read from SDRAM or RFIFO into m-engine and classified (via SRAM tables) 9. Packet headers modified 10. mpackets sent to interface 11. Poll for space on MAC Update transmit-ready if room for mpacket 12. mpackets transferred to MAC 8 9 8 8 9

33 Programming the IXP ● Focus of this course on steps 7, 8, and 9 ● 2 programming frameworks – Command-line, IXA Active Computing Engine (ACE) framework – Graphical microengine C development environment

34 Programming the IXP ● Command-line, IXA Active Computing Engine (ACE) framework – Re-usable function blocks chained together to build an application (Chapters 22-24) – New functions implemented as new blocks in chain ● Core ACEs (StrongARM) – Written in C ● Microblock ACEs (microengines) – Written in assembler

35

36 Programming the IXP ● Graphical microengine C development environment – Monolithic microengine C code (can not be used on IXP1200 hardware) – Demos forthcoming


Download ppt "CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang."

Similar presentations


Ads by Google