1. Module-based PR: Implement each Reconfigurable Module as an individual project Constrain each PR module to be placed in a given partition Initially full Bitstream is loaded and partial Bitstream of a complete PR module is loaded on demand Supported by Plan Ahead. Will be covered in detail 2. Difference-based PR: Implement each Reconfigurable Module as an individual project Constrain each PR module to be placed in a given partition Compute the difference of Bitstreams of the Reconfigurable modules to obtain the differential partial bitstream Initially full Bitstream is loaded and differential partial Bitstream of a PR module is loaded on demand Design Flows for PR 4
PR Specific Design Flow Comparison Without PRWith PR 5
Example PR Design System level block diagram (Implemented on Zync 7000 7Z202 SoC) Original Sobel processed Sepia processed Reconfigurable part Static part The Reconfigurable “Filter Engine” will be replaced with Sobel or Sepia filter part during Runtime partial Reconfiguration. 6 Ref: Application Note: Zynq-7000 All Programmable SoC
Design Flow Vivado : Converts high level code to RTL code Xilinx Synthesis Tool : Converts RTL code to Netlist PlanAhead tool: Used for 1.Reconfigurable partitioning 2.Floorplan the design 3.Add Reconfigurable modules 4.Run Implementation tools to generate Full and partial bit stream Non PR specific design flow PR specific design flow 7 Ref: Application Note: Zynq-7000 All Programmable SoC
Setting Partition A partition defines the smallest atomic area a module can be assigned Different Partitioning styles possible Not all supported by commercial vendors. Island style Slot Based Grid Based Partitioning Style 9 IslandSlot BasedGrid based
Setting Partition Partitioning style affects placement and flexibility Island style - suffers from fragmentation. Offered by the current vendors Xilinx and Altera. Slot style - Also suffers from fragmentation but to a lesser extent. Some academic tools have explored this style –ReCoBus Grid Style - Reduced fragmentation. Difficult to support. To enhance flexibility, the PR module must be placed and routed in every region it needs to be configured. Additional stress on Bit stream size. Placement Flexibility 10 IslandSlot BasedGrid based
Setting Partition In the Netlist view of the synthesized design, select FILTER ENGINE to set partition The type of partition should be selected as Reconfigurable partition 11 Ref: Application Note: Zynq-7000 All Programmable SoC
Adding Default Netlist Select the Sobel Filter Netlist for the Reconfigurable partition 12 Ref: Application Note: Zynq-7000 All Programmable SoC
Adding Reconfigurable Modules Add Sepia filter Netlist for the Reconfigurable partition 13 Ref: Application Note: Zynq-7000 All Programmable SoC
Floor Plan the PR region The PlanAhead tool requires the User to manually select the PR region considering the amount of resources required for the most complex reconfigurable module 14 Ref: Application Note: Zynq-7000 All Programmable SoC
Floor Plan the PR Region Column wise layout of different logic primitives Must be considered when placing Depending on the type of logic primitives used by the module(SLICEX, SLICEM, etc), relocation may or may not be possible. Resource Consideration 15
Floor Plan Design Recommendations When possible, add frames to an RP range in the same clock region rather than adding an additional clock region to avoid clock starvation 16 Global Clocks
Partition Manually optimize the Fanout before the automatic Placement and routing, done in implementation stage, for a better design 17 Floor Plan Design Recommendations Fan Outs
Implementation Sobel filter Sepia filter The Final Placed and Routed designs for Sobel and Sepia filter 18 Ref: Application Note: Zynq-7000 All Programmable SoC
Generating Bit Streams This step will generate Full and Partilal bitstream for Sepia and Sobel filter. The full bitstream of sobel could be used as initial bitstream The partial bitstream of sepia and Sobel could be loaded to FPGA via PCAP on demand 19 Ref: Application Note: Zynq-7000 All Programmable SoC
Bit Streams - Review Row address – 0 to 9 Top/Bottom row of the FPGA Together with row address can locate the tile Major Address : Columns 0 onwards Minor Address : No. of frames in tile Block type : Logic Blocks, BRAMs, Routing Blocks 20 Frame Address Register Frame Composition
Device Configuration Flow 21 Ref: Application Note: Zynq-7000 All Programmable SoC 1.Boot Loader loaded on to on chip RAM 2. Full Sobel bitstream via the Processor Configuration Access Port (PCAP) is loaded to FPGA. 3. The user application loads the partial bit streams into DDR memory upon start-up 4. At this point, the application can use the partial bit streams at any time to modify the pre-defined PL regions while the rest of the FPGA remains fully active and uninterrupted
Power Consideration PR itself requires power Power during PR is spent in: 1. Configuration Data Access – 2. Actual configuration of FPGA Resources Bonamy, R., et al. "Power Consumption Models for the Use of Dynamic and Partial Reconfiguration." Microprocessors and Microsystems (2014). 23
Case Study Fault tolerant Processor IF,MAC and ALU are the PRMs Different configurations available for each module. Focus on the self healing feature more than the performance itself. Fault Tolerance – Self Healing Architecture 24 Psarakis, Mihalis, and Andreas Apostolakis. "Fault tolerant FPGA processor based on runtime reconfigurable modules."
Case Study Reconfigurable Crypto processor Processor can choose from Different crypto algorithms Major Area savings Some Power Savings too. 25 Hori, Yohei, Toshihiro Katashita, and Kazukuni Kobara. "Energy and area saving effect of Dynamic Partial Reconfiguration on a 28-nm process FPGA."
Case Study Fast Start up is a 2 step configuration Useful in time critical systems to initiate a swift system start up. Example : Automotive safety Fast Start Up 26 Meyer, Joachim, et al. "Fast start-up for spartan-6 fpgas using dynamic partial reconfiguration."
Challenges of Partial Reconfiguration Complicated design flow Tool Support Doesn’t support Slot/Grid Style Manual Placement Steps Manual assistance for reconfiguring different target devices. Security issues Although encryption option is provided, security issues persist. Decreased performance as compared to full configuration. Xilinx reports a 10% degradation in clock frequency when using PR. Xilinx PR Implementation Flow HDL Design Description HDL Synthesis Set Design Constraints Placement Analysis Implement Static Design and PR Modules Merge Final Bitsreams Manual steps 27
Our Project A Run time reconfigurable motion estimation. Motion estimation (Block Matching) techniques used in video stabilization. Switch between 2 different algorithms (Full Search and Diamond Search) depending on external inputs such as video quality Achieve a tradeoff between speed and accuracy based on external inputs. Evaluate metrics such as area savings, power savings and reconfiguration time. PR tools are not in matured state. So it will be a challenging task to implement the motion estimation algorithms using PR, hence we have a backup plan to implement “Algorithmic approach to partial bit stream relocation”. 28