Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Chapter 3 General-Purpose Processors: Software
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho Luigi Carro Instituto.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Chia-Yen Hsieh Laboratory for Reliable Computing Microarchitecture-Level Power Management Iyer, A. Marculescu, D., Member, IEEE IEEE Transaction on VLSI.
Operating System Support Focus on Architecture
Pipelining What is it? How does it work? What are the benefits? What could go wrong? By Derek Closson.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 2: Pipeline problems & tricks dr.ir. A.C. Verschueren Eindhoven.
Advanced Computer Architectures
Apr 14,2003CPE 631 Project Performance Analysis and Power Estimation of ARM Processor Team: Ajayshanker Krishnamurthy Swathi Tanjore Gurumani Zexin Pan.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Topics Introduction Hardware and Software How Computers Store Data
Basics and Architectures
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Paper Review: XiSystem - A Reconfigurable Processor and System
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
2013/01/14 Yun-Chung Yang Energy-Efficient Trace Reuse Cache for Embedded Processors Yi-Ying Tsai and Chung-Ho Chen 2010 IEEE Transactions On Very Large.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 Summary of technical.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Principles of Linear Pipelining
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Chapter One Introduction to Pipelined Processors
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 2 Computer Organization.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Re-configurable Bus Encoding Scheme for Reducing Power Consumption of the Cross Coupling Capacitance for Deep Sub-micron Instructions Bus Siu-Kei Wong.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
Memory COMPUTER ARCHITECTURE
Low-power Digital Signal Processing for Mobile Phone chipsets
A Closer Look at Instruction Set Architectures
Embedded Systems Design
Architecture & Organization 1
Genomic Data Clustering on FPGAs for Compression
Improving java performance using Dynamic Method Migration on FPGAs
Architecture & Organization 1
Performance Optimization for Embedded Software
Topics Introduction Hardware and Software How Computers Store Data
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
A High Performance SoC: PkunityTM
Computer Evolution and Performance
Instruction Level Parallelism
Presentation transcript:

Mahesh Sukumar Subramanian Srinivasan

Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more functional and more complex appliances. Great challenge to design the embedded system applications.  These systems must have enough processing power to handle these tasks.

Java Java is becoming increasingly popular in embedded environments. More then 721 million devices are shipped with Java each year. Furthermore, it is predicted that 80% of mobile phones support Java.

Current Design goal Current design goals must include a careful look on embedded Java architectures. Embedded systems must have  Low power dissipation.  Support a huge software library to cope with stringent design times. Need for architectures that can support all the software development effort currently required.

Java compliant Architecture Binary Translation Unit. Reconfiguration Cache. Reconfigurable Array. Speeds up the system and reduce energy consumption. Results in extra area.

Binary Translation Unit A separate unit is responsible for  dynamic analysis of the instructions.  find the sequences that can be executed in the array. BT saves the configuration for the potential sequence of instructions in a reconfiguration cache. There is a delay involved with the reconfiguration. If the sequence of instructions is going to be repeated performance and energy gains are meaningful.

Reconfigurable Cache List A write command for the reconfigurable cache is sent. This command saves the content of the buffer to this cache. This list is made in real time, as the instructions are fetched from memory. The size of the buffer is of 20 eight-bit registers long.

Reconfigurable Array The used coarse grained reconfigurable array is tightly coupled to the processor. The array is divided in blocks, called cells. The operand block (a sequence of Java bytecodes) previously detected is fitted in one ore more of these cells in the array.

Cell of the Reconfigurable Array The initial part of the cell is composed by three functional units (ALU, shifter, ld/st). After the first part, six identical parts follow in sequence. Each cell of the array has just one multiplier and takes exactly one processor cycle to complete execution. For each cell in the array, 327 reconfiguration bits are needed. Consequently, if the array is formed by 3 cells, 971 bits in the reconfiguration cache are necessary.

Run time detection and analysis The detection is performed at run time. The next time that the sequence of instructions is detected it can be executed in the array. Prevents loss of cycles for execution.

Results The tool utilized to provide data on the energy consumption, memory usage and performance is a configurable compiled-code cycle accurate simulator. We compare the processor coupled with the reconfigurable array with VLIW versions with the same instruction set.

Conclusion We showed in this paper the implementation of Java compliant architecture to work with a coarse-grain array in a native Java processor. Boosts performance and reduce energy consumption. The search of the potential sequence of instructions is done at run-time. Furthermore, we demonstrated that there is no need for huge available parallelism in the application, such as it is in VLIW and Low power architectures, to achieve good results.

Future Work More algorithms concerning the optimizations aimed at the reconfigurable arrays can be evaluated. Furthermore, we can use another Java processor for the analysis of instructions instead of a dedicated hardware.

Thank you.