Synthesizable, Application-Specific NOC Generation using CHISEL Maysam Lavasani †, Eric Chung † †, John Davis † † † : The University of Texas at Austin.

Slides:



Advertisements
Similar presentations
SoCks Hardware / Software Codesign Andrew Pearson Sanders DeNardi ECE6502 May 4, 2010.
Advertisements

PRESENTED BY: PRIYANK GUPTA 04/02/2012 Generic Low Latency NoC Router Architecture for FPGA Computing Systems & A Complete Network on Chip Emulation Framework.
EECE579: Digital Design Flows
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
XILINX ISE 9.1/9.2. To Get Familiar with the Environment How to start an FPGA project How to target your design to particular type of FPGA How to describe.
NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
1 Multi-Core Architecture on FPGA for Large Dictionary String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
Configurable System-on-Chip: Xilinx EDK
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
ECE 699: Lecture 2 ZYNQ Design Flow.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.
Introduction to FPGA AVI SINGH. Prerequisites Digital Circuit Design - Logic Gates, FlipFlops, Counters, Mux-Demux Familiarity with a procedural programming.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
OpenSoC Fabric An open source, parameterized, network generation tool
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
ECE 449: Computer Design Lab Coordinator: Kris Gaj TAs: Tuesday session: Pawel Chodowiec Thursday session: Nghi Nguyen.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
LAB #2 Xilinix ISE Foundation Tools Schematic Capture “A Tutorial”
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Modeling with hardware description languages (HDLs).
Modern VLSI Design 3e: Chapter 8 Copyright  1998, 2002 Prentice Hall PTR Topics n Modeling with hardware description languages (HDLs).
Explicit Modeling of Control and Data for Improved NoC Router Estimation Andrew B. Kahng +*, Bill Lin * and Siddhartha Nath + UCSD CSE + and ECE * Departments.
Digital System Design Verilog ® HDL Introduction to Synthesis: Concepts and Flow Maziar Goudarzi.
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Design Flow: HW vs. SW Yilin Huang Overview Software: features and flexibility Hardware: performance Designs have different focuses.
Yu Cai Ken Mai Onur Mutlu
1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.
SCORES: A Scalable and Parametric Streams-Based Communication Architecture for Modular Reconfigurable Systems Abelardo Jara-Berrocal, Ann Gordon-Ross NSF.
FPGA-Based System Design Copyright  2004 Prentice Hall PTR Topics n Modeling with hardware description languages (HDLs).
Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald University of Colorado at Boulder.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
A Low-Area Interconnect Architecture for Chip Multiprocessors Zhiyi Yu and Bevan Baas VLSI Computation Lab ECE Department, UC Davis.
FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C.
Corflow Online Tutorial Eric Chung
Implementing RISC Multi Core Processor Using HLS Language - BLUESPEC Liam Wigdor Instructor Mony Orbach Shirel Josef Semesterial Winter 2013.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.
ASIC Design Methodology
Topics Modeling with hardware description languages (HDLs).
Please do not distribute
FPGAs in AWS and First Use Cases, Kees Vissers
OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel
RTL Design Methodology
Topics Modeling with hardware description languages (HDLs).
Reconfigurable Computing
Computer Architecture
Win with HDL Slide 4 System Level Design
Hyoukjun Kwon*, Michael Pellauer**, and Tushar Krishna*
RTL Design Methodology
RTL Design Methodology Transition from Pseudocode & Interface
Presentation transcript:

Synthesizable, Application-Specific NOC Generation using CHISEL Maysam Lavasani †, Eric Chung † †, John Davis † † † : The University of Texas at Austin † †: Microsoft Research Acknowledgement: Jonathan Bachrach and rest of CHISEL team.

Problem/motivation Goal: Flexible, App-specific NOC Generation Accuracy Accuracy Performance Performance Power Power Design space exploration Design space exploration Supports for parametric design Supports for parametric design Available solutions C-based software simulation (e.g. Orion) inaccurate C-based software simulation (e.g. Orion) inaccurate RTL too low-level RTL too low-level Bluespec is not free Bluespec is not free Web-based solutions are closed source Web-based solutions are closed source This talk: Our experience building NOCs w/ CHISEL 2

Chisel Workflow Hardware in Chisel Test-bench code in Scala Chisel compiler Verilog code C++ simulation code C++ simulation Functional/Performance results Synthesis flow UC Berkeley Open-source Built on top of Scala Object-oriented Functional Verilog simulation 3 Tool Input/output

Network-on-Chip Generator R R R RR RRR RRR RRR Big Router Small Router Big Router Small Router Customizable Features Topology (e.g., mesh, ring, torus) Topology (e.g., mesh, ring, torus) Buffer sizes Buffer sizes Link widths Link widths Routing Routing Targeted for FPGA (evaluated) FPGA (evaluated) ASIC (future work) ASIC (future work) Fully synthesizable Xilinx ISE 13+ Xilinx ISE 13+ 4

Parameterized Router Input port Switch State Stored Route Route logic RR Arbiter Output port Mediator State 5 Stored Route Route logic Input port

2D Mesh Example in Chisel val routers = Range(0, numRows, 1).map(i => new Range(0, numColumns, 1).map(j => new MyRouter(5, routerID(i, j), XYrouting))) 6 RRR RRR RRR RRR R R R R

2D Mesh Example in Chisel 7 for (i <- 0 until numRows) { for (j <- 1 until numColumns) { routers(i)(j).io.ins(south) <> routers(i)(j-1).io.outs(north) routers(i)(j).io.outs(south) <> routers(i)(j-1).io.ins(north)}} RRR RRR RRR RRR R R R R

2D Mesh Example in Chisel 8 for (j <- 0 until numRows) { for (i <- 1 until numColumns) { routers(i)(j).io.ins(west) <> routers(i-1)(j).io.outs(east) routers(i)(j).io.outs(west) <> routers(i-1)(j).io.ins(east)}} RRR RRR RRR RRR R R R R

2D Mesh Example in Chisel 9 for (i <- 0 until numRows) { for (j <- 0 until numColumns) { io.tap(routerID(i, j)).deq <> routers(i)(j).io.outs(cpu) io.tap(routerID(i, j)).enq <> routers(i)(j).io.ins(cpu)}} RRR RRR RRR RRR R R R R

2D Mesh Example in Chisel val routers = Range(0, numRows, 1).map(i => new Range(0, numColumns, 1).map(j => new MyRouter(5, routerID (i, j), XYrouting))) for (j <- 0 until numRows) { for (i <- 1 until numColumns) { routers(i)(j).io.ins(west) <> routers(i-1)(j).io.outs(east) routers(i)(j).io.outs(west) <> routers(i-1)(j).io.ins(east)}} for (i <- 0 until numRows) { for (j <- 1 until numColumns) { routers(i)(j).io.ins(south) <> routers(i)(j-1).io.outs(north) routers(i)(j).io.outs(south) <> routers(i)(j-1).io.ins(north)}} for (i <- 0 until numRows) { for (j <- 0 until numColumns) { io.tap( routerID (i, j)).deq <> routers(i)(j).io.outs(cpu) io.tap( routerID (i, j)).enq <> routers(i)(j).io.ins(cpu)}} Fits on 1 page! 10

Application Case Study: K-means Cluster N points in D-dim space into C clusters N = 12, C = 3, D = 2 Pick C initial centers Assign N points to nearest center Compute new centers Max Iterations or Converge? Done YesNo 11

Parallel K-means accelerator Customized Network- on-Chip Reduction Core Core (Nearest Distance) Memory Banks Streamer DMA Core (Nearest Distance) RR RRR R 12

Performance Sensitivity to NOC Number of Cores

My experience - positives 14 Chisel (V.1.0) improves productivity Bulk interfaces Bulk interfaces Parameterized classes Parameterized classes Type inference reduces errors Type inference reduces errors Functional features Functional features Faster C++ based simulation Faster C++ based simulation Open source (BSD license) UCB support Tested on large-scale UCB projects

My experience - negatives Compiler (V.1.0) not as robust as commercial tools Long compile time Long compile time Memory leak Memory leak Large circuits loading time Large circuits loading time Single clock domain Cannot mix synthesizable and behavioral code 15

Thank you Please come and see my poster 16