Implementation of a De-blocking Filter and Optimization in PLX Ashwin Alapati Anandnayan Jayaraman
Outline Motivation Algorithm Transformation Proposed Architecture PLX Implementation Conclusion and Results
Motivation What is De-blocking? Types In-Loop De-blocking Post Processing De-blocking Computationally Intensive!!
Algorithm Input Image Pick a Macro-Block (16 x 16 ) Identify Blocking Artifacts in Horizontal Direction and apply Adaptive Filtering Identify Blocking Artifacts in Vertical Direction and apply Adaptive Filtering Output Image
Block Boundary Detection Determine Block Boundaries Strength Determination Adaptive Filtering FIR filtering with varying coefficients
Algorithm Transformation Concepts Used Retiming Reducing Critical Path Unfolding Reduce Iteration Bound PLX Sub-word Parallelism Exploit Parallelism Parallel Execution by Loop Vectorization
Architecture Post Processing Address Generation + Memory OUT Mux IN Horizontal Block Boundary Detection Horizontal Filtering Vertical Filtering OUT Mux IN Post Processing Vertical Block Boundary Detection Address Generation + Memory
Input to the Architecture
Results
Profiling Results Operation % of total Execution Time 35.285 24.428 Vertical Boundary Detection 35.285 Horizontal Boundary Detection 24.428 Vertical Filtering 12.324 Horizontal Filtering 8.532 Misc ( Image IO ) 19.431
Issues in PLX Getting Input Values Used C to dump the bmp values into a file Memory Access Used a sequential way of addressing the data
Results of PLX Implementation PLX implementation - 1548 cycles C code profiling - 4843 cycles Approximate speedup is 3.1X Around 20% faster in terms of time
Work Done Selecting the Algorithm Developed Architecture Implemented algorithm in C Profiling Implemented algorithm in PLX Performance Evaluation
Future Work Try optimizing the PLX code Use PLX for filtering as well
Thank You !! Questions ???