XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.

Slides:



Advertisements
Similar presentations
Technische universiteit eindhoven November 2000Ad Verschueren and Bart Theelen1 The Multi Micro Processor Eindhoven.
Advertisements

Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
An Overview of Software-Defined Network Presenter: Xitao Wen.
MotoHawk Training Model-Based Design of Embedded Systems.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Router Architecture : Building high-performance routers Ian Pratt
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.
1 Network Packet Generator Characterization presentation Supervisor: Mony Orbach Presenting: Eugeney Ryzhyk, Igor Brevdo.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Prardiva Mangilipally
Microsoft Virtual Academy Module 4 Creating and Configuring Virtual Machine Networks.
An Overview of Software-Defined Network Presenter: Xitao Wen.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
Chapter 4 Queuing, Datagrams, and Addressing
Hardware Overview Net+ARM – Well Suited for Embedded Ethernet
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Paper Review Building a Robust Software-based Router Using Network Processors.
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
1 Hardware Support for Collective Memory Transfers in Stencil Computations George Michelogiannakis, John Shalf Computer Architecture Laboratory Lawrence.
Survey of Existing Memory Devices Renee Gayle M. Chua.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
REXAPP Bilal Saqib. REXAPP  Radio EXperimentation And Prototyping Platform Based on NOC  REXAPP Compiler.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
EE3A1 Computer Hardware and Digital Design
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Verification Methodology of Gigabit Switch System 1999/9/9 Yi Ju Hwan.
Introducing Moon the Next Generation Java TM Processor Core VULCAN MACHINES’ MOON PROCESSOR CORE.
Intel ® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside.
PaxComm Co. Ltd. 라우터 / 스위치 Chipset ㈜ 팍스콤. PaxComm Co. Ltd. 백 영식 2 목차 1. Layer 2, Layer 3 switching 2. Switching Chip architectures 3. Galileo-I architecture.
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Ethernet Bomber Ethernet Packet Generator for network analysis
 Program Abstractions  Concepts  ACE Structure.
Lecture Note on Switch Architectures. Function of Switch.
Fundamentals of Programming Languages-II
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
David M. Zar Block Design Review: PlanetLab Line Card Header Format.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
What is CRKIT Framework ? Baseband Processor :  FPGA-based off-the-shelf board  Control up to 4 full-duplex wideband radios  FPGA-based System-on-Chip.
ARM Embedded Systems
Switching and High-Speed Networks
Addressing: Router Design
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Implementing an OpenFlow Switch on the NetFPGA platform
P4FPGA : A Rapid Prototyping Framework for P4
Network Processors for a 1 MHz Trigger-DAQ System
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
Project proposal: Questions to answer
Modified from notes by Saeid Nooshabadi
Programmable logic and FPGA
Presentation transcript:

XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit

2 Overview  What is XStream ?  Comparison to Network Processors  Design Flow  Design Example: Ethernet Bridge/VLAN Switch

3 What is XStream ?  Software tool to rapidly generate high performance custom stream processors  Stream Processing: Repeated application of an algorithm kernel to a sequence of packets subject to throughput specifications  Resulting custom processors:  40-90% performance of a custom ASIC  < 5% design effort of a custom ASIC  Rapidly develop your own ultra high performance network processors!

4 When you use a Network Processor What your product looks like What your competitor’s product looks like

5 XStream vs Network Processor What if my application does not look like this ?

6 XStream vs Network Processor What if my application does not look like this ? Network Processor: No help XStream: Make a system that looks like my app in days

7 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?

8 XStream vs Network Processor What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ? Network Processor: No help XStream: Select a different controller from the GUI and plop it on the chip

9 XStream vs Network Processor  What if I need  Different type/number of micro-engines  More capable control processor  Additional high performance processors for value added services  More crypto cores  Different trie lookup hardware  Different DRAM bandwidth  Etc, etc, etc  Network processor: No help  XStream: Yes

10 Design Flow  Draw an architecture diagram for your application  Select processors, interfaces, IP blocks etc from a GUI  Specify parameters, throughput requirements etc  Specify the high level function of any additional custom coprocessors you need  Press a button and wait...  XStream generates the h/w for you

11 Design Example  Objective:  Design a platform chip that is shared across different products to save cost  Product 1: 16 port Ethernet Bridge  Product 2: 16 port VLAN switch with advanced filtering abilities  Major differences:  Wimpy ingress/egress processors ok on the bridge  VLAN Switch needs high performance ingress/egress processors  VLAN Switch needs high performance filter rule engine

12 XStream: Designing a Platform Chip Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM

13 The Streams in XStream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM

14 The Streams in Xstream Link Interface Port Ingress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM Port Egress Processor

15 The Streams in Xstream Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM

16 XStream: Mapping the core processor Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM

17 XStream: Mapping the core processor... Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer says:  Stream processor, 8 issue  Stream 1: Input, 16x1 queue, N deep  Stream 2: Output,16x1 queue, M deep  Stream 3: Inout, RISC processor interface  Add a CAM: 2 port, 48 bit keys, 1024 entries, 4 way associative, hash=F(…)  The tool ponders for a while…  Says: “Yes master”

18 Ingress Queue Egress Queue Stream Processor for Switching Decisions  Imagine a snazzy GUI here  Designer writes 15 lines of code for the data plane, say in a subset of C  Designer says: Schedule and report  The tool ponders for a while…Says:  Compiled 45 instructions  Using modulo accelerator  Initiation interval = 8 cycles  Clock speed: 500 MHz  Throughput based on 64 byte (worst case) packet size:  500MHz/8 * 64 * 8 = 32 Gb/s  Area: 2.5mm x 2.5mm  Power: 1.2 W  Single stream 500 MHz = 32 Gb/s  Have designed up to 1 GHz processor in 0.13u process XStream: Mapping the core processor...

19 XStream: Mapping the ingress processor... Link Interface Port Ingress Processor Port Egress Processor Link Interface Port Ingress Processor Port Egress Processor ports Ingress Queue Egress Queue Crossbar Stream Processor for Switching Decisions Control Processor External DRAM

20 XStream: Mapping the ingress processor... Port Ingress Processor Filter Rule Engine  Imagine a snazzy GUI here  Designer says:  RISC processor engine, no-cache  2 issue, scratchpad memory  Stream 1: Input, link interface  Stream 2: Output, StreamProc:Ingress Queue  Add a Filter Rule Engine: Rule complexity = 64 terms, …  The tool ponders for a while…Says:  RISC core and compiler generated  Area: 1mm x 1mm (i.e. this can be replicated 100x on a 10x10mm chip)  Power: 250 mW

21 Summary  Showed network processor design  But might as well be multi-media or wireless product design  Very high performance custom processors replace ASIC modules  Reduce design time for stream oriented ASIC modules by 95%  Retain 40-90% of ASIC performance  Software replaces hardware design  Software prototype already exists  Flexible, fast bug fixes, feature upgrades  Share chip across product family