ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.

Slides:



Advertisements
Similar presentations
CT213 – Computing system Organization
Advertisements

Chapter 2 Data Manipulation Dr. Farzana Rahman Assistant Professor Department of Computer Science James Madison University 1 Some sldes are adapted from.
Computer Abstractions and Technology
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 3 Memory Management Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
1 Hardware and Software Architecture Chapter 2 n The Intel Processor Architecture n History of PC Memory Usage (Real Mode)
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
OS Spring’03 Introduction Operating Systems Spring 2003.
Figure 1.1 Interaction between applications and the operating system.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
ECE 526 – Network Processing Systems Design
Using Two Queues. Using Multiple Queues Suspended Processes Processor is faster than I/O so all processes could be waiting for I/O Processor is faster.
Chapter 9 Classification And Forwarding. Outline.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
ARM Core Architecture. Common ARM Cortex Core In the case of ARM-based microcontrollers a company named ARM Holdings designs the core and licenses it.
Chapter 17 Microprocessor Fundamentals William Kleitz Digital Electronics with VHDL, Quartus® II Version Copyright ©2006 by Pearson Education, Inc. Upper.
COM181 Computer Hardware Ian McCrumRoom 5B18,
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Computer Organization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 1: Introduction.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
ECE 526 – Network Processing Systems Design Networking: protocols and packet format Chapter 3: D. E. Comer Fall 2008.
Operating Systems Lecture 7 OS Potpourri Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Ch. 2 Data Manipulation 4 The central processing unit. 4 The stored-program concept. 4 Program execution. 4 Other architectures. 4 Arithmetic/logic instructions.
Computer Architecture And Organization UNIT-II General System Architecture.
CHAPTER 4 The Central Processing Unit. Chapter Overview Microprocessors Replacing and Upgrading a CPU.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Computer Architecture 2 nd year (computer and Information Sc.)
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
ECE 526 – Network Processing Systems Design Microengine Programming Chapter 23: D. E. Comer.
بسم الله الرحمن الرحيم MEMORY AND I/O.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
ECE 526 – Network Processing Systems Design Network Address Translator II.
CHAPTER 2 Instruction Set Architecture 3/21/
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Operating Systems A Biswas, Dept. of Information Technology.
Introduction to Operating Systems Concepts
Computer Organization and Architecture Lecture 1 : Introduction
Nios II Processor: Memory Organization and Access
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Processes and threads.
Memory COMPUTER ARCHITECTURE
Chapter 1: A Tour of Computer Systems
Instruction Set Architecture
CS 286 Computer Organization and Architecture
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Apparao Kodavanti Srinivasa Guntupalli
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Chapter 13: I/O Systems.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer

Ning WengECE 5262 Overview Recalled ─ Packet processing functions (forwarding, queuing…) ─ Traditional network processing systems (CPU + NICs) ─ General network processor architecture and tradeoffs ─ Intel IXP network processors overall architecture Focus on individual components of Intel IXP chip ─ Control processor (slow path): XScale core Overall architecture Typical functions Processor features ─ Packet processing processor (fast path): Microengines Architecture and features Differences to conventional processors Pipelining and multi-threading

Ning WengECE 5263 Purpose of Control Processor Functions typically executed by embedded control proc: ─ Bootstrapping ─ Exception handling ─ Higher-layer protocol processing ─ Interactive debugging ─ Diagnostics and logging ─ Memory allocation ─ Application programs (if needed) ─ User interface and/or interface to the GPP ─ Control of packet processors ─ Other administrative functions

Ning WengECE 5264 XScale Memory Architecture Memory architecture ─ Uses 32-bit linear address space ─ configurable endian mode ─ Byte addressable Memory Mapping ─ Allocation of address space (2^32) to different system components ─ Accesses to memory is translated into access to component ─ Needs to be carefully crafted XScale assumes byte addressable memory ─ Underlying memory uses different size (SDRAM) ─ How does this work? Support for Virtual Memory ─ For demand paging to secondary storage

Ning WengECE 5265 Shared Memory Address Issues Memory is shared between XScale and Microengines Same data, but different addresses What impact does this have? ─ Pointers need to be translated ─ Data structures with pointers can not be shared

Ning WengECE 5266 Microengines Microengines are data-path packet processors IXP IXP 2400 have 8 Microengines Simpler than XScale Low level device as a micro-sequencer Optimized for packet processing More complex to use Often abbreviated as uE

Ning WengECE 5267 uE Functions uEs handle ingress and egress packet processing: ─ Packet ingress from physical layer hardware ─ Checksum verification ─ Header processing and classification ─ Packet buffering in memory ─ Table lookup and forwarding ─ Header modification ─ Checksum computation ─ Packet egress to physical layer hardware

Ning WengECE 5268 uE Architecture uE characteristics: ─ Programmable microcontroller ─ RISC design ─ 256 general-purpose registers ─ 512 transfer registers ─ 128 next neighbor registers ─ Hardware support for 8 threads and context switching ─ 640 words of local memory ─ Control of an Arithmetic and Logic Unit ─ Direct access to various functional units ─ A unit to compute a Cyclic Redundancy Check (CRC)

Ning WengECE 5269 uE as Micro-sequencer Micro-sequencer does not contain native instructions for possible operations ─ Instead of using instructions, uE invokes functional units to perform operations ─ Control unit is much “simpler” Example 1: ─ uE does not have ADD R2,R3 instruction ─ Instead: ALU ADD R2, R3 ─ “ALU” indicates that ALU should be used ─ “ADD” is a parameter to ALU Example 2: ─ Memory access not by simple LOAD R2, 0xdeadbeef ─ Instead: SRAM LOAD R2, 0xdeadbeef Altogether similar to normal processor, but more basic

Ning WengECE uE Instruction Set General ─ ALU and etc Brach and Jump ─ BR: branch unconditionally CAM ─ CAM_CLEAR: clear all entries in local memories I/O and context swap ─ SCRATCH (read and write) For detail see Figure 19.1, 19.2, Comer.

Ning WengECE uE Memories uEs: viewing memories differently than XScale does ─ Does not map memories and I/O devices into a liner address space ─ Does not view memories as a seamless, uniform repository uE ISA: requiring a separate instruction for each type of memory and I/O device ─ SRAM[read, $$x, address1, address2…] Programmer: required binding of data items to specific type of memory permanently.

Ning WengECE Execution Pipeline What is pipeline? Why pipeline is employed? ─ One instruction is executed per cycle if pipeline is proper designed uEs use five-stage or six-stage pipeline:

Ning WengECE Pipelining

Ning WengECE Pipelining Problems Possible sources of pipelining problems ─ Data dependencies ─ Control dependencies ─ Resource dependencies ─ Memory accesses How pipelining problem impact system performance How these impact can be removed or reduced ─ Remove the sources so that no stall happened ─ Hide the impact of pipelining stall

Ning WengECE Pipeline Stalls K: ALU ADD R2, R1, R2 K+1 ALU ADD R3, R2, R3 Control dependencies, memory have even bigger impact

Ning WengECE Threading Illustration

Ning WengECE Hardware Threads uEs support 8 hardware thread contexts ─ One thread can execute at any given time ─ When stall occurs, uE can switch to other thread (if not stalled) Very low overhead for context switch ─ “Zero-cycle context switch” ─ Effectively can take around three cycles due to pipeline flush Switching rules ─ If thread stalls, check if next is ready for processing ─ Keep trying until ready thread is found ─ If none is available, stall uE and wait for any thread to unblock Improves overall throughput Questions: ─ Why not 16, 32 threads ─ why not have 48 uEs with 1 thread?

Ning WengECE Summary Control processor (slow path): XScale core Overall architecture Typical functions Processor features Packet processing processor (fast path): Microengines Architecture and features Differences to conventional processors Pipelining and multi-threading

Ning WengECE Lab3 Brief Intel Reference Systems SDK Tutorial Lab 3

Ning WengECE Intel Reference Systems Hardware Testbed ─ IXP2400 network processors ─ QDRM-SRAM, Flash ROM and other memories ─ 1G optical ethernet ports ─ 100M ethernet management port ─ Serial interface ─ PCI interfaces SDK (software development kit) ─ Compiler ─ Assembler, linker ─ Simulator ─ Reference codes

Ning WengECE Lab3: Forwarding, Counting & Classification Goal: to explore the basic functionalities of the IXP2400 software development kit and Microengines. 3 parts: ─ Part I: collecting a number of workload statistics from the IXP SDK simulator. Follow steps of lab instruction. ─ Part II: adding one counting block to count the number of packets. ─ Part III: implementing a simple packet classification mechanism. Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA.

Ning WengECE Part I: Forwarding Simulation run an implementation of IP forwarding on the IXP2400 simulator. All the code is provided to you. collect a set of workload statistics that are reported by the simulator.

Ning WengECE Part II: Forwarding and Counting modify above applications by adding counter block store how many packets are received.

Ning WengECE Part III: Classification and Counting classifying packets based on the packet header information. There are four types of traffic that are considered in this lab: ─ Web traffic over TCP over IPv4 ─ Non-Web traffic over TCP over IPv4 ─ UDP over IPv4 ─ IPv6 modifying the code to report the number of packets in each type.

Ning WengECE How to do Lab3 Windows machine with SDK installed Download lab instructions and source code from blackboard Start early. Very exciting lab. Due day ─ Part I and Part II 10/13 ─ Part III 10/20