Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.

Slides:



Advertisements
Similar presentations
An Overview of ABFT in cloud computing
Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
ECE 506 Reconfigurable Computing ece. arizona
A HIGH-PERFORMANCE IPV6 LOOKUP ENGINE ON FPGA Author : Thilan Ganegedara, Viktor Prasanna Publisher : FPL 2013.
Digital Design Copyright © 2006 Frank Vahid 1 FPGA Internals: Lookup Tables (LUTs) Basic idea: Memory can implement combinational logic –e.g., 2-address.
Distributed Arithmetic
Lecture 7 FPGA technology. 2 Implementation Platform Comparison.
Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design.
Bryan Lahartinger. “The Apriori algorithm is a fundamental correlation-based data mining [technique]” “Software implementations of the Aprioiri algorithm.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
BIST for Logic and Memory Resources in Virtex-4 FPGAs Sachin Dhingra, Daniel Milton, and Charles Stroud Electrical and Computer Engineering Auburn University.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
1 Student: Khinich Fanny Instructor: Fiksman Evgeny המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי לישראל.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
1 Students: Lin Ilia Khinich Fanny Instructor: Fiksman Evgeny המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
1 Performed by: Lin Ilia Khinich Fanny Instructor: Fiksman Eugene המעבדה למערכות ספרתיות מהירות High Speed Digital Systems Laboratory הטכניון - מכון טכנולוגי.
SCOTT MILLER, AMBROSE CHU, MIHAI SIMA, MICHAEL MCGUIRE ReCoEng Lab DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF.
V The DARPA Dynamic Programming Benchmark on a Reconfigurable Computer Justification High performance computing benchmarking Compare and improve the performance.
Context Switch in Reconfigurable System Sun, Yuan-Ling ESL of CSIE, CCU
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
Pipelined Architecture For Multi-String Match Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
February 4, 2002 John Wawrzynek
Distributed Arithmetic: Implementations and Applications
GanesanP91 Synthesis for Partially Reconfigurable Computing Systems Satish Ganesan, Abhijit Ghosh, Ranga Vemuri Digital Design Environments Laboratory.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Unit 12 Registers and Counters Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information Engineering Da-Yeh.
GPGPU platforms GP - General Purpose computation using GPU
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Author :Tim Oliver, Bertil Schmidt, Darran Nathan, Ralf Clemens, and Douglas Maskell1. Publisher/Conf : th International Conference on Parallel and.
Introduction to Computer Engineering ECE/CS 252, Fall 2010 Prof. Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
EE3A1 Computer Hardware and Digital Design
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Author : Weirong Jiang, Yi-Hua E. Yang, and Viktor K. Prasanna Publisher : IPDPS 2010 Presenter : Jo-Ning Yu Date : 2012/04/11.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
Comparison of Various Multipliers for Performance Issues 24 March Depart. Of Electronics By: Manto Kwan High Speed & Low Power ASIC
FPGA Field Programmable Gate Arrays Shiraz University of shiraz spring 2012.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
By M.M. Bassiri and H. S. Shahhoseini Elisha Colmenar
2018/4/27 PiDFA : A Practical Multi-stride Regular Expression Matching Engine Based On FPGA Author: Jiajia Yang, Lei Jiang, Qiu Tang, Qiong Dai, Jianlong.
Backprojection Project Update January 2002
Registers and Counters
School of Engineering University of Guelph
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Spartan FPGAs مرتضي صاحب الزماني.
Multipliers Multipliers play an important role in today’s digital signal processing and various other applications. The common multiplication method is.
Reconfigurable Computing
The Xilinx Virtex Series FPGA
Reconfigurable Computing (EN2911X, Fall07)
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing Systems April 5, 2007 Winter 2007

Resource Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04) Authors: Manish Handa and Ranga Vemuri. Department of ECECS, University of Cincinnati.

OUTLINE Introduction Model of a Partially Reconfigurable Dynamic System. Serial Architecture. Parallel Architecture. Serial-Parallel Architecture. Results. Conclusion. Paper Evaluation.

Introduction Placement is one of the most time consuming steps in an computer aided design (CAD) environment. In ROS more than one application may execute on the same FPGA. The application is divided into tasks that are placed on FPGA using partial reconfiguration. In a dynamically reconfigurable systems, the sequence of tasks is known at run time. a placement engine is required to place each subsequent task while the application is running (at run-time). Such a placement paradigm is called online placement.

Model of a Partially Reconfigurable Dynamic System

The host: Online placement. Controls the execution of the tasks on the FPGA. Maintains the clock and other global signal. FPGA: Application area.. Operating system (OS) area.

Serial Architecture 1. FPGA modeling: The FPGA surface was modeled as a two dimensional array called area matrix. Each cell in the array represents a CLB in the FPGA. A weight of 0 for every occupied cell. Figure 2 shows an example of the area matrix of a FPGA with three tasks placed on it.

FPGA modeling

Serial Architecture 2. Deign Details:

Deign Details The area matrix is stored in a memory on a row-by-row basis. The height and the width of the task are stored in macro height and macro width registers. An up-counter is used to address the memory.

Run-time Performance and Overhead The worst case run time = Total memory requirement= Main overhead of this architecture is the time taken for host to write the area matrix in the shared memory.

Parallel Architecture 1. FPGA modeling: A weight of 0 for occupied cells and a weight of 1 for empty cells.

Deign Details Consist of three main component: Area matrix. Reconfigurable adder. Parallel comparator.

Deign Details 1. Area matrix: One data word of the area matrix memory holds one full column of the area matrix. The area matrix memory can easily be implemented using look-up tables (LUTs) in the FPGA. The area matrix memory has height of 64 and width of 2 CLBs.

Deign Details 2. Reconfigurable adder: Height of the reconfigurable adder is equal to height of the FPGA Rf and its width is equal to width of the task Wt. Consume Rf * (Wf+1)FF and Rf * (Wf-1) LUTs.

Deign Details

3. Parallel comparator: Consists of two stages.

Run-time Performance and Overhead The worst case run time = Total memory requirement= Partial reconfiguration is the main overhead of the parallel architecture.

Serial-Parallel Architecture 1. Design details: Accumulator

Run-time Performance and Overhead The worst case run time = Total memory requirement= the execution time of the serial-parallel architecture is higher than the other two architectures.

Results The circuit was tested on Xilinx Virtex XCV1000 FPGA. The Placement engine was implemented for an 64 CLB high and 96 CLB wide FPGA. 25 X 25 takes was used.

Conclusion Three hardware architectures for chip-based two dimensional online placement were presented. Table 2 shows a comparison between the three architectures.

Paper Evaluation Pros Some examples was to explain the functionality of some parts of the circuit. Good description about the first two architecture was given. Cons: They did not give enough information about the last architecture. Some information was not clear.

Q & A Thank you