HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Computer Architecture (EEL4713, Fall 2013) Partial Reconfiguration Not just a half baked job of reconfiguring Rohit Kumar Research Student University of.
Run-Time FPGA Partial Reconfiguration for Image Processing Applications Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida Dr. Ann Gordon-Ross.
A self-reconfiguring platform Brandon Blodget,Philip James- Roxby, Eric Keller, Scott McMillan, Prasanna Sundararajan.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
ENG6530 Reconfigurable Computing Systems Dynamic Run Time Reconfiguration Operating System Support & Embedded Systems.
1 A Self-Tuning Configurable Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
QUIZ What does ICAP stand for ? What is its main use ? Why is Partition Pin preferred over Bus Macro? 1.
Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Optimizing Dynamic.
Hardwired networks on chip for FPGAs and their applications
Customizing Virtual Networks with Partial FPGA Reconfiguration
Figure 2.8 Compiler phases Compiling. Figure 2.9 Object module Linking.
Configurable System-on-Chip: Xilinx EDK
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
MicroC/OS-II Embedded Systems Design and Implementation.
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Bitstream Relocation with Local Clock Domains for Partially Reconfigurable FPGAs Adam Flynn, Ann Gordon-Ross, Alan D. George NSF Center for High-Performance.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Partially Reconfigurable System-on-Chips for Adaptive Fault Tolerance Shaon Yousuf Adam Jacobs Ph.D. Students NSF CHREC Center, University of Florida Dr.
CPACT – The Conditional Parameter Adjustment Cache Tuner for Dual-Core Architectures + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Operating Systems for Reconfigurable Systems John Huisman ID:
Heng Tan Ronald Demara A Device-Controlled Dynamic Configuration Framework Supporting Heterogeneous Resource Management.
1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.
Heterogeneous Multikernel OS Yauhen Klimiankou BSUIR
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Design Framework for Partial Run-Time FPGA Reconfiguration Chris Conger, Ann Gordon-Ross, and Alan D. George Presented by: Abelardo Jara-Berrocal HCS Research.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
A Single-Pass Cache Simulation Methodology for Two-level Unified Caches + Also affiliated with NSF Center for High-Performance Reconfigurable Computing.
Partial Region and Bitstream Cost Models for Hardware Multitasking on Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems Ted Huffmire, Brett Brotherton, Gang Wang, Timothy Sherwood, Ryan.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 - CPRE 583 (Reconfigurable Computing): Reconfiguration Management Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: Wed 10/14/2009.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
Reconfigurable Embedded Processor Peripherals Xilinx Aerospace and Defense Applications Brendan Bridgford Brandon Blodget.
FPGA Partial Reconfiguration Presented by: Abelardo Jara-Berrocal HCS Research Laboratory College of Engineering University of Florida April 10 th, 2009.
M. ALSAFRJALANI D. DZENITIS Runtime PR for Software Radio 2/26/2010 UFL ECE Dept 1 PARTIAL RECONFIGURATION (PR)
Analysis of Cache Tuner Architectural Layouts for Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
Concurrency, Processes, and System calls Benefits and issues of concurrency The basic concept of process System calls.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
Minimum Effort Design Space Subsetting for Configurable Caches + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work.
Chapter 2 Process Management. 2 Objectives After finish this chapter, you will understand: the concept of a process. the process life cycle. process states.
© imec 2003 Designing an Operating System for a Heterogeneous Reconfigurable SoC Vincent Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins IMEC,
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
An Automated Hardware/Software Co-Design
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch
Reconfigurable Hardware Scheduler for RTS
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
Aurelio Morales-Villanueva and Ann Gordon-Ross+
Tosiron Adegbija and Ann Gordon-Ross+
Shaon Yousuf Ph.D. Student NSF CHREC Center, University of Florida
A Self-Tuning Configurable Cache
University of Florida, Gainesville, Florida, USA
Dynamic Partial Reconfiguration of FPGA
Presentation transcript:

HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing Aurelio Morales-Villanueva and Ann Gordon-Ross + Department of Electrical and Computer Engineering University of Florida, Gainesville, Florida, USA This work was supported by National Science Foundation (NSF) grants EEC and IIP , and Programa de Ciencia y Tecnología (FINCyT) under contract FINCyT-BDE

2 of 18 Introduction Multitasking in modern systems –Software tasks: implemented in microprocessors –Hardware tasks: implemented in field-programmable gate arrays (FPGAs) Task management requirements in any multitasking system –Task scheduling Mechanism to begin task execution on a system resource –Task suspension Preempt task and saving execution state --- context save (CS) –Task switching Replacing preempted task with another task –Task restoration Resume preempted task from saved state --- context restore (CR) T7T7 T’ 4 T3T3 T’ 2 T1T1 time ABAB Task T’ 6 T5T5 T’ 4 T3T3 T’ 2 T1T1 time ABCABC Task T6T6 T5T5 T4T4 T3T3 T2T2 T1T1 time ABCABC

3 of 18 Context Switching Analogy Multiple tasks execute concurrently Multiple regions time multiplex tasks –Use context save and context restore for task switching Task relocation –Challenging! Leveraging partial reconfiguration can be advantageous… One task executes at a time –Time multiplex tasks One stack for all tasks –Stack saves context for task restore No task relocation No reconfiguration of uP Hardware task N Hardware task 2 Hardware task 1 uP or custom HW Region 3 Region 2 Region 1 FPGA FPGA Context Switch Software task N Software task 2 Software task 1 Stack OS + MMU uP Processor Context Switch

4 of 18 Reconfiguration on FPGAs Benefits to system designers and functionality –Run-time hardware adaptation via time multiplexing of resources –Reduced area/power requirements –Two types of reconfiguration: Full and partial reconfiguration Full reconfiguration: –Entire FPGA configured with full bitstream with fixed hardware task set –Reconfiguration halts all tasks --- lengthy task switching time –No context save/restore of hardware tasks --- tasks restart execution Execution and state of all tasks is lost during full reconfiguration! Configuration Port Full bitstream 2 Full bitstream 1 HW task C1 HW task B1 HW task A1 HW task C2 HW task B2 HW task A2 Unless state can be saved via checkpoint of data capture

5 of 18 –Increased flexibility –Increased task throughput/performance –Enhances hardware multitasking Requires context save and context restore for effective task switching PR compared to full reconfiguration –Dynamic, on-the-fly PR of individual PRRs No interruption of static region or other PRRs! –Uses partial bitstreams --- smaller than full bitstreams Faster reconfiguration time *May* require a bitstream for each PRM-to-PRR mapping PR divides the FPGA fabric into two regions –Static region: fixed functionality, never reconfigured after initial configuration at startup –Reconfigurable region: multiple PR regions (PRRs) PRRs execute PR modules (PRMs) (hardware tasks) Partial Reconfiguration (PR) Function Power On Time Static region operation Re configuration Overhead Configuration Overhead Module D Module C Module B Module A Embedded processor ICAP Mem Controller Reconfig. region Static region

6 of 18 Previous Related Work Context save and restore (CSR)  Over the same PRR using: Off-chip CSR software –Requires external host (e.g., CPU) –Lengthy communication overhead On-chip CSR hardware –Device overhead, limited portability +No task re-execution after context restore Bitstream relocation (BR) +Eliminates per-PRR bitstream requirement  Off/on-chip BR over homogeneous PRRs Same shape/resources, different FPGA fabric locations -Does not save/restore task’s context Re-execution of relocated task required –No relocation over heterogeneous PRRs PRMs custom relocate + reconfigure PRMs homogeneous PRRs custom HW context restore task’s context context save reconfigure

7 of 18 Motivations Improvements over off-chip context save and restore (CSR) software –Eliminate external host requirement --- autonomous –Eliminate communication overhead between FPGA and external host Improvements over on-chip context save and restore (CSR) hardware –Eliminate custom hardware requirement and area overhead –Eliminate portability limitations Improvements over bitstream relocation (BR) –Enable task relocation over heterogeneous PRRs –Eliminate task re-execution requirement

8 of 18 Contributions On-chip hardware task relocation (HTR) software –Relocation of hardware tasks over heterogeneous PRRs PRRs with different shape, resources, locations on the FPGA fabric –Relocation of hardware task context Task’s execution state is preserved --- no re-execution HTR benefits –Maximizes shared resources utilization Reduces idle PRRs --- PRMs can run in any available PRR* –Maximizes task throughput Improves task scheduling performance Preemption and resumption of tasks in different PRRs –Priority considerations Eliminates lengthy re-execution time of relocated preempted tasks * Some restrictions apply, but are addressed in our current/future work

9 of 18 Hardware Task Relocation (HTR)

10 of 18 HTR Operation Still Busy Context Save Context Save Context Restore M2 PRR4 M3 PRR3 M6 PRR4 M5 PRR3 M4 PRR2 M5 PRR2 M1 PRR1 Task Relocation M2 PRR1 Still Busy Resume Operation Use initial partial bitstream Resume Operation Use initial partial bitstream Task Relocation Context Restore Main HTR software steps –Initialize static region and PRRs Initial configuration of tasks in all PRRs –Context save (CS) Save the execution state of the preempted hardware task –Task relocation (TR) Generation of a “merged bitstream” of relocated task –Context restore (CR) Resume execution of preempted task in another PRR

11 of 18 HTR Flow Chart Example system Execute PRM2 on PRR2 Context restore of PRM2 on PRR2 T CR2 Saved context relocation of PRM2 in PRR2 T relocate Context save of PRM3 T CS2 Hardware task relocation (HTR) of PRM2 on PRR2 Execute PRM2 on PRR1 Context restore of PRM2 on PRR1 Context restore of PRM2 on PRR1 T CR1 Merge saved context of PRM2 with initial bitstream of PRM2 T merge Context save of PRM1 T CS1 Context save and restore (CSR) NY PRR1 free or  priority PRM Execute PRM2 N Y Reconfiguration of PRR1 with PRM1 Enable FPGA protection T protect_FPGA T reconfig_PRR Initialization PRR2 PRR1 PRM 3 Merged PRM 2 PRM 2 PRM 2 PRM 1 PRM2 executed in PRR1 and saved the context Executing PMR3 Initial conditions

Experimental Results

13 of 18 Experimental Setup Virtex-5 LX110T on Xilinx ML509 board Two reconfigurable regions (PRRs) –Contain configurable logic block (CLB) resources (flip-flops) –Heterogeneous sizes and FPGA fabric locations Task relocation cases –Small-to-large PRR task relocation –Large-to-small PRR task relocation –Source PRR = PRR with task’s saved context –Destination PRR = PRR with task’s relocated context –Restrict test cases for tractability Large PRR = 2x number of columns and frames of small PRR Number of flip-flops used by the task are the same for all PRR sizes

14 of 18 Context save time shows linear growth rate Dependent on the number of PRM frames in the source PRR Results – Context Save (T CS ) (per column)

15 of 18 Task relocation time shows non-linear growth rate –Random access pattern to saved context and merged bitstreams –High data cache misses and overheads accessing data from SDRAM Dependent on the number of PRM flip-flops to be relocated Results – Task Relocation (T relocate ) (per column)

16 of 18 Context restore time shows nearly linear growth rate Dependent on the number PRR frames in the merged bitstream (destination PRR) Context restore time is larger than PRR reconfiguration time (T reconfig_PRR ) –Unprotection and protection of destination PRR Results – Context Restore (T CR ) (per column)

17 of 18 Conclusions Introduced on-chip hardware task relocation (HTR) software –Task relocation between heterogeneous PRRs Maximizes shared resource utilization –No off-chip communication overhead –No design/system constraints –Application/system independent –Aids system design Detailed descriptions entended for designer use Analysis enables designers to tradeoff HTR execution times and task/PRR granularity Future work –Extend HTR’s functionalities Include DSP, BRAM, and IOB resources in PRRs

18 of 18 Questions?