Case Study ARM Platform-based JPEG Codec HW/SW Co-design
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
ISO/IEC 10918-1 JPEG JPEG: Joint Photographic Experts Group JPEG voted as international standard in 1994 JPEG standard has four compression method Baseline sequential DCT-based coding Progressive DCT-based coding Lossless coding method Sampling and Quantization are not considered at loss-less coding scheme Hierarchical coding method
Baseline sequential V.S. Progressive DCT-based coding Compression Method T I S O 7 3 - 9 / d Baseline sequential V.S. Progressive DCT-based coding
Block Diagram of JPEG Encoder R G B Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code
Color Model in Video ─ YCrCb Y: Luminance Cb,Cr: Chrominance YCbCr color model is used in JPEG and MPEG
Color Model in Video ─ YCrCb CCIR-601 transform formula Color space transform is loss-less
Chroma Sub-sampling 4:1:1 and 4:2:0 are mostly used in JPEG and MPEG
Block Diagram of JPEG Encoder R G B Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code
2-D DCT (Discrete Cosine Transform) Frequency domain Space domain
Basis Image of 2-D DCT Horizontal Frequency Vertical Frequency Low High Vertical Frequency High
Frequency Distribution of 2-D DCT By frequency: By direction:
8 point 1-D DCT Algorithm (1/2) Better for VLSI design implementation!
8 point 1-D DCT Algorithm (2/2)
Implementation 2-D DCT Separable, row-column decomposition X Transport Memory (Y) Z 1D DCT Unit 1D DCT Unit Y=AX Z=YAT
Block Diagram of JPEG Encoder R G B Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code
Quantization Table for Luminance 16 11 10 24 40 51 61 12 14 19 26 58 60 55 13 67 69 56 17 22 29 87 80 62 18 37 68 109 103 77 35 64 81 104 113 92 49 78 121 120 101 72 95 98 112 100 99
Quantization Table for Chrominance 17 18 24 47 99 21 26 66 56
Block Diagram of JPEG Encoder R G B Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code
Predictive Coding of DC Coefficients Differential Pulse Code Modulation (DPCM) To Store the differential value is better than the exact value.
Zig-zag Scan (AC Coefficients)
Run-Length Coding(RLC) DC (R,L) => (0,-3)(0,-2)(0,-1)(0,-2)(0,-1)(2,-1)(EOB)
Huffman Coding (R,L) => (0,-3)(0,-2)(0,-1)(0,-2)(0,-1)(2,-1)(EOB) Category AC Coefficient Range 1 -1,1 2 -3,-2,2,3 3 -7,…,-4,4,…,7 4 -15,…,-8,8,…,15 5 -31,…,-16,16,…,31 6 -63,…,-32,32,…,63 7 -127,…,-64,64,…,127 8 -255,…,-128,128,…,255 9 -511,…,-256,256,…,511 10 -1023,…,-512,512,…,1023 11 -2047,…,-1024,1024,…,2047 (0,2)(-3),(0,2)(-2),(0,1)(-1),(0,2)(-2),…(0,0) (Run,SSSS/Catagory) Huffman Table
Huffman Coding for DC and AC Coefficient Run/Size Code length Code word 0/0 (EOB) 14 1010 0/1 12 00 0/2 01 0/3 13 100 0/4 1011 0/5 15 11010 0/6 17 1111000 0/7 18 11111000 0/8 10 1111110110 0/9 16 1111111110000010 0/A 1111111110000011 1/1 1100 1/2 11011 1/3 1111001 1/4 19 111110110 Category Code length Code word 10 2 000 11 3 010 12 011 13 100 14 101 15 110 16 4 1110 17 5 11110 18 6 111110 19 7 1111110 8 11111110 9 111111110 Table for luminance DC coefficient differences Table for luminance AC coefficients (0,2)(3),(0,2)(-2),(0,1)(-1),(0,2)(-2),…(0,0) =>(01) (11) (01) (01) ……(1010)
Example of Baseline DCT-based Coding For Y, (8*8 pixels *8 bits/pixel = 512 bits) FDCT -128 Q (6)(61),(0,2)(-3), (0,3)(4),(0,1)(-1), (0,3)(-4),(0,2)(2), (1,2)(2),(0,2)(-2), (0,2)(-2),(5,2)(2), (3,1)(1),(6,1)(-1), (2,1)(-1),(4,1)(-1), (7,1)(-1),(0,0) Zig-Zag (1110)(111101)(01)(00)(100) (100)(00)(0)(100)(001)(01) (10)(11011)(10)(01)(01)(01) (01)(11111110111)(10)(111010)(1)(1111011)(0)(11100)(0) (111011)(0)(11111010)(0)(1010) Huffman Run-length total 102 bits Q Table
Block Diagram of JPEG Encoder R G B Y Cb Cr 01001011101… DPCM: Differential Pulse Code Modulation RLC: Run-Length Code
Block Diagram of JPEG Decoder 01001011101…
JPEG Bitstream
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
Review Process run Linker Tailoring the C library Load and Execution View
Process Run
Stack Function parameter Local variable
Heap malloc() new operator
Linker
ARM Linker Control File
Linker
Tailoring the C Library The management of writable memory as static data, heap and stack Functions that can be redefined Redirection I/O function
Memory Model Single memory region The stack grows downward from the top of memory The heap grows upwards from the bottom of the region
Single Memory Model
Controlling Runtime Memory Model Function Description __user_initial_stackheap() Return the location of the initial heap __user_heap_extend() Returns the size and base address of a heap extra block __user_stack_slop Returns the amount of extra stack
My Own Memory Model Function Description __rt_stackheap_init() It is responsible for setting up sp and sl to point a valid stack __rt_stack_overflow() It is called if a stack overflow occurs __rt_heap_extend() This function returns a new 8-byte aligned block
Trailing the I/O Function
Load View and Execution View
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
AHB Protocol
AHB Wrapper
Input Pin Block Diagram
Output Pin Block Diagram
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
Lab ─ Case Study Goal Principles Requirement Discussion Implement the JPEG codec system using ARM platform Principles Implement the ARM platform-based JPEG codec HW/SW co-design Requirement Analysis the profiling of pure software simulation Explain how to partition the HW/SW of JPEG codec Implement the JPEG codec with HW/SW co-design Discussion Explain where is the stack and heap ? And who initialize them
File Structure
Read & Write Address FDCT IDCT Write_head 0xcc000000 0xcc000004 0xcc00000c 0xcc000010 0xcc000014 0xcc000018 0xcc00001c Write_head 0xcc000040 0xcc000044 0xcc000048 0xcc00004c 0xcc000050 0xcc000054 0xcc000058 0xcc00005c FDCT IDCT Read_head 0xcc000020 0xcc000024 0xcc000028 0xcc00002c 0xcc000030 0xcc000034 0xcc000038 0xcc00003c Read_head 0xcc000060 0xcc000064 0xcc000068 0xcc00006c 0xcc000070 0xcc000074 0xcc000078 0xcc00007c
Result for SW Simulation Original Encoder Decoder
Result for HW Simulation Original Encoder Decoder
Profiling Result of SW Simulation
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
Step 1 (Only Software) 首先,請先確定工作目錄。例如:D:\ARMSoC\Final_project\ 請確定工作目錄下是否有sw.bat此批次檔
Step 2 (Only Software) 執行sw.bat此批次檔。方法有二:第一是直接在sw.bat此批次檔的圖示上按滑鼠左鍵兩下即可﹔第二個方法是在命令提示字元視窗下,進入工作目錄後,鍵入sw.bat
Step 3 (Only Software) 開啟AXD Debugger的視窗 選擇〝File → Load Image 選擇〝Execute → Go〞
Step 1 ( SW/HW ) 確定工作目錄。例如:D:\ARMSoC\Final_project\
Step 1 ( SW/HW ) 利用Xilinx ISE軟體將提供之Verilog HDL碼編譯為可燒錄之*.bit檔
Step 2 ( SW/HW ) 將ahbahbtop.bit檔燒錄至ARM Integrator之LM模組上 在燒錄時需要Download.brd以及LM_flash_load.bit此二檔案
Step 3 ( SW/HW ) 執行hw.bat此批次檔 批次檔執行結束之後,確定工作目錄中是否產生了hw.axf檔案
Step 4 ( SW/HW ) 開啟AXD Debugger的視窗 選擇〝File → Load Image〞 選擇〝Execute → Go〞
Outline Introduction to JPEG Codec Review ─ Software ( Concept ) Review ─ Hardware ( Wrapper ) Lab ─ Case study Lab ─ Step and Step Reference
Reference Wen-Hsiung Chen, C. Harrison Smith, and S. C. Fralick, "A Fast Computational Algorithm for the Discrete Cosine Transform," IEEE Trans. Commun., vol. COM-25, pp. 1004-1009, Sept 1977. JPEG: Still Image Data Compression Standard by William B. Pennebaker and Joan L. Mitchell, Kluwer Academic Publishers, ISBN: 0442012721