Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Network Processors : Building Block for Programmable High- Speed Networks Introduction to the.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Based upon presentations from Raj Yavatkar, Intel and Wu-Chang Feng, OGI Introduction to Network.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Computer Abstractions and Technology
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 QoS Support.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Processor history / DX/SX SX/DX Pentium 1997 Pentium MMX
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
Introduction to ARM Architecture, Programmer’s Model and Assembler Embedded Systems Programming.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Chapter 7 Interupts DMA Channels Context Switching.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Embedded Systems Programming
Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.
Prardiva Mangilipally
Computer Organization and Assembly language
COM181 Computer Hardware Ian McCrumRoom 5B18,
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Lecture Note on Network Processors. What Is a Network Processor? Processor optimized for processing communications related tasks. Often implemented with.
Paper Review Building a Robust Software-based Router Using Network Processors.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
3G Single Core Modem A New Telecommunications Device Group 4: Warren Irwin, Austin Beam, Amanda Medlin, Rob Westerman, Brittany Deardian.
CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
1 TM The ARM Architecture - 1 Embedded Systems Lab./Honam University ARM Architecture SA-110 ARM7TDMI 4T 1 Halfword and signed halfword / byte support.
IXP Lab 2012: Part 1 Network Processor Brief. NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface.
Intel ® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside.
Processor Architecture
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Introduction to Network Processors Readout Unit Review 24 July 2001 Beat Jost Cern / EP.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
OpenFlow MPLS and the Open Source Label Switched Router Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
 Program Abstractions  Concepts  ACE Structure.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Fundamentals of Programming Languages-II
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Constructing a system with multiple computers or processors
Architecture & Organization 1
CS 31006: Computer Networks – The Routers
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Architecture & Organization 1
Lec 11 – Multicore Architectures and Network Processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Apparao Kodavanti Srinivasa Guntupalli
Constructing a system with multiple computers or processors
Instructor: L.N. Bhuyan CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L.N. Bhuyan.
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Presentation transcript:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Network Processors : Building Block for Programmable High- Speed Networks Introduction to the Intel IXA q Shiv Kalyanaraman q Yong Xia (TA) q q

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 What do switches/routers look like? Access routers e.g. ISDN, ADSL Core router e.g. OC48c POS Core ATM switch

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Dimensions, Power Consumption Cisco GSR 12416Juniper M160 6ft 19 ” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19 ” Capacity: 80Gb/s Power: 2.6kW

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Where high performance packet switches are used Enterprise WAN access & Enterprise Campus Switch - Carrier Class Core Router - ATM Switch - Frame Relay Switch The Internet Core Edge Router

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Where are routers? Ans: Points of Presence (POPs) A B C POP1 POP3 POP2 POP4 D E F POP5 POP6 POP7 POP8

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 POP with smaller routersPOP with large routers q Interfaces: Price >$200k, Power > 400W q Space, power, interface cost economics! q About 50-60% of i/fs are used for interconnection within the POP. q Industry trend is towards large, single router per POP. Why the Need for Big/Fast/Large Routers?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 What’s a Network Processor q Router vendors have built speed into their devices by pushing functionality down into hardware (ASICs). q ASIC: Application Specific Integrated Circuits q Fast but custom-made => expensive q Long time-to-market Network processors look to avoid these pitfalls by introducing specialized, software controlled devices that can be customized quickly. But they also process packets at near-wire speeds!

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 How does the IXA simplify the ASIC based design ? q A Typical ASIC Based Design q A processor to handle routing information and higher level processing q ASICs to handle each packet q An IXP 1200 Design q StrongArm Core to handle routing algorithms and higher level processing q Microengines to handle packet processing

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Applications of Network Processors q Fully programmable architecture q Implement any packet processing applications q Examples from customers q Routing/switching, VPN, DSLAM, Multi-servioce switch, storage, content processing q Intrusion Detection (IDS) and RMON q Use as a research platform q Experiment with new algorithms, protocols q Use as a teaching tool q Understand architectural issues q Gain hands-on experience withy networking systems

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Intel IXP Network Processors q Microengines q RISC processors optimized for packet processing q Hardware support for multi-threading q Fast path q Embedded StrongARM/Xscale q Runs embedded OS and handles exception tasks q Slow path, Control plane ME 1ME 2ME n StrongARM SRAMDRAM Media/Fabric Interface Control Processor

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Packet Flow Diagram: IXP 1200

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Intel’s Gear (1) q The IXP 1200 product line represents Intel’s first attempt in the area (it was actually inherited when they purchased Digital) q The IXP 1200 is a single-board chip, designed with abstractions in mind. q Since this is a new area, and it’s designed to be used with many different types of hardware and software, the documentation is sketchy q To achieve wire-fast speeds with software, the goal is to hide latency with parallelism. Processing packets is inherently parallel, and necessary for fast applications.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 Intel’s Gear (2) q IXP2850 q Designed for use in virtual private networks, secure web services, and storage area networks. q IXP2800 q Able to handle line rates ranging from OC-48 to OC-192. q IXP2400 q Designed for OC-12 to OC-48 network access and edge applications.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Various forms of Processors Embedded Processor (run-to-completion) Parallel architecture Pipelined Architecture

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Intel Internet Exchange Architecture q Micro-engine technology — a subsystem of programmable, multi-threaded RISC micro-engines that enable high-performance packet processing in the data plane through Intel® Hyper Task Chaining. This multi- processing technology features software pipelining and low-latency sequence management hardware. q The Intel IXA Portability Framework — an easy-to-use modular programming framework providing the advantages of software investment protection and faster time-to-market through code portability and reuse between network processor-based projects, in addition to future generations of Intel IXA network processors. q Intel® XScale™ technology — providing the highest performance-to- power ratio in the industry.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 IXP: A Building Block for Network Systems q Example: IXP2800 q 16 micro-engines + XScale core q Up to 1.4 Ghz ME speed q 8 HW threads/ME q 4K control store per ME q Multi-level memory hierarchy q Multiple inter-processor communication channels q NPU vs. GPU tradeoffs q Reduce core complexity q No hardware caching q Simpler instructions  shallow pipelines q Multiple cores with HW multi- threading per chip MEv2 10 MEv2 11 MEv2 12 MEv2 15 MEv2 14 MEv2 13 MEv2 9 MEv2 16 MEv2 2 MEv2 3 MEv2 4 MEv2 7 MEv2 6 MEv2 5 MEv2 1 MEv2 8 RDRAM Controller Intel® XScale™ Core Media Switch Fabric I/F PCI QDR SRAM Controller Scratch Memory Hash Unit Multi-threaded (x8) Microengine Array Per-Engine Memory, CAM, Signals Interconnect

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 XScale Core processor q Compliant with the ARM V5TE architecture q support for ARM’s thumb instructions q support for Digital Signal Processing (DSP) enhancements to the instruction set q Intel’s improvements to the internal pipeline to improve the memory-latency hiding abilities of the core q does not implement the floating-point instructions of the ARM V5 instruction set

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Microengines – RISC processors q IXP 2800 has 16 microengines, organized into 4 clusters (4 MEs per cluster) q ME instruction set specifically tuned for processing network data q 40-bit x 4K control store q Six-stage pipeline in an instruction q On an average takes one cycle to execute q Each ME has eight hardware-assisted threads of execution q can be configured to use either all eight threads or only four threads q The non-preemptive hardware thread arbiter swaps between threads in round-robin order

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 MicroEngine v2 128 GPR Control Store 4K Instructions 128 GPR Local Memory 640 words 128 Next Neighbor 128 S Xfer Out 128 D Xfer Out Local CSRs CRC Unit 128 S Xfer In 128 D Xfer In LM Addr 1 LM Addr 0 D-Push Bus S-Push Bus D-Pull BusS-Pull Bus To Next Neighbor From Next Neighbor A_Operand B_Operand ALU_Out P-Random # 32-bit Execution Data Path Multiply Find first bit Add, shift, logical 2 per CTX CRC remain Lock 0-15 Status and LRU Logic (6-bit) TAGs 0-15 Status Entry# CAM Timers Timestamp Prev B B_op Prev A A_op

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Why Multi-threading?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Packet processing using multi- threading within a MicroEngine

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Registers available to each ME q Four different types of registers q general purpose, SRAM transfer, DRAM transfer, next-neighbor (NN) q 256, 32-bit GPRs q can be accessed in thread-local or absolute mode q 256, 32-bit SRAM transfer registers. q used to read/write to all functional units on the IXP2xxx except the DRAM q 256, 32-bit DRAM transfer registers q divided equally into read-only and write-only q used exclusively for communication between the MEs and the DRAM q Benefit of having separate transfer and GPRs q ME can continue processing with GPRs while other functional units read and write the transfer registers

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Hardware Features to ease packet processing q Ring Buffers q For inter-block communication/synchronization q Producer-consumer paradigm q Next Neighbor Registers and Signaling q Allows for single cycle transfer of context to the next logical micro-engine to dramatically improve performance q Simple, easy transfer of state q Distributed data caching within each micro-engine q Allows for all threads to keep processing even when multiple threads are accessing the same data

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 Different Types of Memory Type of Memory Logical width (bytes) Size in bytesApprox unloaded latency (cycles) Special Notes Local to ME425603Indexed addressing post incr/decr On-chip scratch 416K60Atomic ops 16 rings w/at. get/put SRAM4256M150Atomic ops 64-elem q- array DRAM82G300Direct path to/from MSF

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Resource Manager Library Control Plane PDK Control Plane Protocol Stacks Core Components IXA Software Framework Microengine Pipeline XScale™ Core Micro block Micro block Micro block Microblock Library Utility LibraryProtocol Library External Processors Hardware Abstraction Library Microengine C Language C/C++ Language Core Component Library

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 Micro-engine C Compiler q C language constructs q Basic types, q pointers, bit fields q In-line assembly code support q Aggregates q Structs, unions, arrays

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 XScale™ Core Micro- engines Core Components and Microblocks User-written code Microblock Library Intel/3 rd party blocks Microblock Microblock Library Microblock Core Component Core Component Core Component Core Libraries Core Component Library Resource Manager Library

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 What is a Microblock q Data plane packet processing on the microengines is divided into logical functions called microblocks q Coarse Grained and stateful q Example q 5-Tuple Classification, IPv4 Forwarding, NAT q Several microblocks running on a microengine thread can be combined into a microblock group. q A microblock group has a dispatch loop that defines the dataflow for packets between microblocks q A microblock group runs on each thread of one or more microengines q Microblocks can send and receive packets to/from an associated Xscale Core Component.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 Technical and Business Challenges q Technical Challenges q Shift from ASIC-based paradigm to software-based apps q Challenges in programming an NPU q Trade-off between power, board cost, and no. of NPUs q How to add co-processors for additional functions? q Business challenges q Reliance on an outside supplier for the key component q Preserving intellectual property advantages q Add value and differentiation through software algorithms in data plane, control plane, services plane functionality q Must decrease time-to-market (TTM) to be competitive

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 For more info…. q Jonathan Gunner q Slide Contributions from Kerry Wood and Shruti Gorappa q OGI IXA course: spring2003/