TEMPLATE DESIGN © 2008 www.PosterPresentations.com Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.

Slides:

Advertisements

Similar presentations

Cache Coherence “Can we do a better job of supporting cache coherence?” Ross Daly Chan Kim.

Advertisements

Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.

Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.

Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.

1. Overview  Introduction  Motivations  Multikernel Model  Implementation – The Barrelfish  Performance Testing  Conclusion 2.

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

© ABB Group Jun-15 Evaluation of Real-Time Operating Systems for Xilinx MicroBlaze CPU Anders Rönnholm.

NoC Modeling Networks-on-Chips seminar May, 2008 Anton Lavro.

Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.

1 E. Bolotin – The Power of Priority, NoCs 2007 The Power of Priority : NoC based Distributed Cache Coherency Evgeny Bolotin, Zvika Guz, Israel Cidon,

1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.

Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.

Storage area network and System area network (SAN)

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation

Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan

On-Chip Networks and Testing

Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)

SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

Multicore In Real-Time Systems – Temporal Isolation Challenges Due To Shared Resources Ondřej Kotaba, Jan Nowotsch, Michael Paulitsch, Stefan.

High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.

Automated Design of Custom Architecture Tulika Mitra

Building Expressive, Area-Efficient Coherence Directories Michael C. Huang Guofan Jiang Zhejiang University University of Rochester IBM 1 Lei Fang, Peng.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.

1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.

Distributed computing using Projective Geometry: Decoding of Error correcting codes Nachiket Gajare, Hrishikesh Sharma and Prof. Sachin Patkar IIT Bombay.

L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수

Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

Network On Chip Platform

Improving NoC-based Testing Through Compression Schemes Érika Cota 1 Julien Dalmasso 2 Marie-Lise Flottes 2 Bruno Rouzeyre 2 WNOC

Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network Hiroki Matsutani (Keio Univ, JAPAN) Michihiro Koibuchi.

1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.

Interconnection network network interface and a case study.

Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.

Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.

Content Project Goals. Workflow Background. System configuration. Working environment. System simulation. System synthesis. Benchmark. Multicore.

1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.

OSI Model. Open Systems Interconnection (OSI) is a set of internationally recognized, non proprietary standards for networking and for operating system.

Towards a Framework to Evaluate Performance of the NoCs Mahmoud Moadeli University of Glasgow.

Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.

Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.

Network On Chip Cache Coherency Midterm presentation Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter Isaschar.

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations 5/3/2011 Michael K. Papamichael, James C.

Research Interests  NOCs – Networks-on-Chip  Embedded Real-Time Software  Real-Time Embedded Operating Systems (RTOS)  System Level Modeling and Synthesis.

The World Leader in High Performance Signal Processing Solutions Heterogeneous Multicore for blackfin implementation Open Platform Solutions Steven Miao.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

Implementing RISC Multi Core Processor Using HLS Language - BLUESPEC Liam Wigdor Instructor Mony Orbach Shirel Josef Semesterial Winter 2013.

Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin

New Opportunities for Computer Architecture Research Using High-Density FPGAs and Design Tools Nahi Abdul-Ghani, Patrick Akl, Mohammad El-Majzoub, Maroulla.

Multi-Processing in High Performance Computer Architecture:

The Multikernel A new OS architecture for scalable multicore systems

Using Packet Information for Efficient Communication in NoCs

Characteristics of Reconfigurable Hardware

ECE 699: Lecture 3 ZYNQ Design Flow.

Chip&Core Architecture

Chapter 2 from ``Introduction to Parallel Computing'',

Presentation transcript:

TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan {meakin, University of Utah School of Computing Project Objectives Implementing Inter-core Communication Multicore Communication API 8-Core MIPS System-on-Chip MIPS Core Data-path Custom On-Chip Network Synthesis Hardware Verification Future Work More Information Services provided by modern computer systems –Computation oriented Fast, low power cost –Communication oriented Slow, high power cost Objectives of this project –Research and implement efficient means of performing on-chip communication –Evaluate the impact of instruction set extensions enabling explicit data transfer –Apply these to a modern communication API –Study the use of semi-formal HW verification tools to verify realistic multicore HW Physical transport layer –Asynchronous network-on-chip –Dual networks; one for user, one for cache controllers MIPS instruction set extension –Enables explicit data transfer –Reduces some hardware complexity Multicore Association Communication API (MCAPI) –Lightweight messaging API designed for embedded multicore systems Implementation –Messages and packet channels use pointers to shared memory –Scalar channels copy data –Uses in-line assembly code 8 processor tiles on a Xilinx Virtex5 FPGA –16-bit MIPS cores (6-stage pipelines) –Private 2KB instruction and 2KB data caches –Shared 4KB slice of L2 data cache –Network interface unit –NUCA –MSI Directory based cache coherence –Various I/O interfaces Wiki page with link to read-only SVN checkout: -Under “MCAPI Hardware Implementation” Ben Meakin's web-page: Multicore Association web-page: Cache Architecture –Direct mapped, 8 words per block –L2 physically distributed/logically shared (NUCA) –L1 private –MSI directory coherence protocol –Write invalidate policy –Simplified form of modern architecture Workload driven synthesis of NoC given a model of an MCAPI target application – Paper under review for HiPEAC '10 – Algorithmic objectives Generate custom topology to minimize average hops / flit for application Synthesize deadlock free routing tables based on shortest path Given approximate node sizes find a physical placement such that average wire distance is minimized Results highly encouraging – From baseline, our algorithms achieved for specific application (> 16 cores) ~50% reduction in avg. hops / flit ~50% reduction in avg. wire distance / flit ~17% increase in throughput Comparable hardware cost – Performed at least as well as baseline for general purpose – Better scalability Application of IBM's Sixthsense semi-formal verification tool to complex multicore hardware – Promises simulator usability with MUCH higher coverage Ability to verify large designs due to non- exhaustive state space exploration Simulation Formal Verification Semi-Formal Verification Cache coherence protocol verification at RTL – Can SXS find bugs not found by simulation? – Further application to pipeline control – Work in progress... Evaluation of SXS and other tools as applied to multicore RTL descriptions Extensive benchmarking of MCAPI implementation and interconnect technology Research additional applications of proposed ISA extension in parallel programming methods Research hardware mechanisms for increasing observability of multicore processors – Deterministic replay