Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.

Slides:

Advertisements

Similar presentations

Computer Organization and Architecture

Advertisements

RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

& Microelectronics and Embedded Systems M 2 μP - Multithreading Microprocessor Thesis Presentation Embedded Systems Research Group Department of Industrial.

Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.

Parallell Processing Systems1 Chapter 4 Vector Processors.

Copyright © 2002 UCI ACES Laboratory A Design Space Exploration framework for rISA Design Ashok Halambi, Aviral Shrivastava,

Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

August Code Compaction for UniCore on Link-Time Optimization Platform Zhang Jiyu Compilation Toolchain Group MPRC.

Embedded Systems Programming

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

1 Enhancing a Reconfigurable Instruction Set Processor with Partial Predication and Virtual Opcode Support Nikolaos Vassiliadis, George Theodoridis and.

Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.

Synthesis of Custom Processors based on Extensible Platforms Fei Sun +, Srivaths Ravi ++, Anand Raghunathan ++ and Niraj K. Jha + + : Dept. of Electrical.

The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

ECE 526 – Network Processing Systems Design

Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.

Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.

Yongjoo Kim*, Jongeun Lee**, Jinyong Lee*, Toan Mai**, Ingoo Heo* and Yunheung Paek* *Seoul National University **UNIST (Ulsan National Institute of Science.

Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Automated Design of Custom Architecture Tulika Mitra

Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu

October 6, 2004.Software Technology Forum 1 The Renaissance of Compiler Development Com piler optimizations motivated by embedded systems Tibor Gyimóthy.

ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.

ARM for Wireless Applications ARM11 Microarchitecture On the ARMv6 Connie Wang.

Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 3 General-Purpose Processors: Software.

Dual-Pipeline Heterogeneous ASIP Design Swarnalatha Radhakrishnan, Hui Guo, Sri Parameswaran School of Computer Science & Engineering University of New.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.

Design Space Exploration for a Coarse Grain Accelerator Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani*, Koji Inoue, Kazuaki Murakami Kyushu University,

1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.

Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑教授組員 : R 張馨怡 R 林秀萍.

Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.

EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.

NISC set computer no-instruction

WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

Sunpyo Hong, Hyesoon Kim

Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.

Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instructions Bus Siu-Kei Wong.

Fast Energy Evaluation of Embedded Applications for Many-core Systems Felipe Rosa, Luciano Ost, Thiago Raupp, Fernando Moraes, Ricardo Reis.

Processor Performance & Parallelism Yashwant Malaiya Colorado State University With some PH stuff.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

CSE 340 Computer Architecture Summer 2016 Understanding Performance.

Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.

Programmable Hardware: Hardware or Software?

Lecture 3: MIPS Instruction Set

Evaluating Register File Size

Improving Program Efficiency by Packing Instructions Into Registers

Flow Path Model of Superscalars

Dynamically Reconfigurable Architectures: An Overview

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

A High Performance SoC: PkunityTM

Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt

Lecture 3: MIPS Instruction Set

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

Presentation transcript:

Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, Thessaloniki, Greece Aristotle University of Thessaloniki

Aristotle University of Thessaloniki, IDAACS ‘05 2 Outline Motivation Scope Benchmarking Suite Design Flow Implementation Experimental Results

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 3 Great expansion of network applications High performance demands Requirements for flexibility to support new protocols and future applications Fast Time-to-Market requirements  While ASICs lack flexibility and GPP are prohibitively expensive in terms of energy-performance ASIPs exploits special characteristics of the application domain to meet the desired specifications Motivation

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 4 Scope Design an ASIP for network applications based on low-cost enhancement of an existing processor Follow a methodology for the implementation of the ASIP from a hardware-software perspective Use the NetBench benchmarking suite for network applications as a vehicle for representing the application domain

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 5 NetBench-benchmarking suite A benchmarking suite for network applications which containing a large variety of tasks Four kernels of the suite were used with a set of typical stimulus as inputs – CRC: CRC-32 checksum calculation – DRR: Deficit-round robin (DRR) scheduling – ROUTE: Table lookup implementation along with internet checksum – URL: URL-based switching

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 6 ASIP Design Flow A RISC, MIPS-like machine is used as the base processor The application is described in C/C++ The instruction set is extended by special instructions in order to increase performance and reduce power consumption

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 7 Network algorithm – Huge source code size – Only a small part of the code is responsible for power and time consumption Pruning – Simulation, profiling, and analysis are performed – The crucial parts of the code are identified – Undesired parts of the code are hide ASIP Design Flow – Pruning

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 8 The GNU-GCC is used as the compiler The compiler is cross- configured for the target architecture (MIPS) The pruned code is used Assembly code is generated ASIP Design Flow – Assembly Generation

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 9 The assembly code is analyzed with the GNU tools Static analysis – The code is parsed – Basic Blocks are identified Dynamic analysis – The code is simulated – Basic Blocks are weighted with execution frequencies – Frequently executed instructions are identified ASIP Design Flow – Analysis

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 10 Frequently executed instructions are consider Instructions are reorder to form patterns The ISA is extended with new complex instructions Hardware modifications for the support of the new ISA are performed ASIP Design Flow – Instruction Generation

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 11 Code generation – Identified patterns are substituted by the new defined instructions in the application assembly code A hardware model (VHDL) is constructed and synthesized Evaluation – Execution cycles – Clock speed – Power consumption ASIP Design Flow – Code Generation/Evaluation

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 12 Instruction Set Extensions New InstructionCycle Reduction (%) Avg. (%) CRCDRRROUTEURL Inc;Branch DEC;Branch CMP;Branch LDR; Jump ADD; Jump ADD;LW AND;LW ADD;SW Shift+XOR Delay slots reduction New instructions – 5 Control flow Instr. – 3 Addressing modes – 1 pure computation Delay slot reduction mechanism Cycle reduction up to 18.6 %

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 13 Hardware Modifications Addition of a shifter for Shift+ALU operations Enhancement of the Control Flow Unit with “Increment/Decrement and Branch” capability Control logic to reduce the delay slots for Branch operations New addressing modes combining addition/AND with Load/Store operations Slightly hardware overhead and no degradation of performance was introduced

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 14 Experimental Results The ARM7TDMI and MIPS-like processors were consider for comparison with the designed ASIP The hardware models of MIPS and ASIP were synthesized in STM 0.13um process For the ARM7TDMI core information were taken from the datasheets ARM7TDMIMIPS-likeASIP Pipeline Stages 355 Clock Speed (0.13um process) 133Mhz253Mhz

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 15 Performance Results Cycles (10^6) App.ARM7MIPSASIP CRC URL DRR ROUTE Cycle accurate simulations were performed – The ARMulator was used for the ARM core – The VHDL model was used for the MIPS and ASIP cores Significant performance improvements are achieved – 80% avg. compared to ARM7 – 50% avg. compared to MIPS

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 16 Energy Results *Power (uW) App.ARM7MIPSASIP CRC URL DRR ROUTE Major source of energy consumption for embedded processors => instruction memory accesses 0.13um SRAM Single port memories models were used Significant energy reductions are achieved – 60% avg. compared to ARM7TDMI – 50% avg. compared to MIPS * Models obtained from Dolphin Integration Embedded Memory Generator

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 17 Conclusions An ASIP for multimedia applications was designed following a simple design flow The ASIP was designed with small enhancements of a popular processor Experimental results prove that significant speedups and power savings can be achieved with these small enhancements – 50% avg. performance and energy consumption improvements compared to the base processor Improvements come with small development cost

Aristotle University of Thessaloniki Aristotle University of Thessaloniki, IDAACS ‘05 18 Thank You Questions???