Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab 10 : JPEG Encoder Team #7 P 李彥勳 P 謝嵩淮

Similar presentations


Presentation on theme: "Lab 10 : JPEG Encoder Team #7 P 李彥勳 P 謝嵩淮"— Presentation transcript:

1 Lab 10 : JPEG Encoder Team #7 P91922003 李彥勳 P90922016 謝嵩淮

2 JPEG Encoder Block Diagram

3 JPEG Encoding Flow RGB->YUV DCT (Discrete cosine transformation)
Quantization Zig-Zag scan Entropy coding (run length + Huffman encoding)

4 JPEG in Pure Software It works fine on ARM platform, only the “memory managemnet” codes needs to be modified, so that we can handle large bitmaps. (1024x768) Before : 1024x768.bmp (2,359,350 bytes) After : 1024x768.jpg (158,697 bytes)

5 Profiling Result (1/2) Name cum% self% desc% calls
process_DU % % process_DU % % % fdct_and_quantization % % writebits % % fdct_and_quantization % % % _fmul % % _fflt % % _fsub % % _fadd % % _f2d % % _dfix % % _dadd % %

6 Profiling Result (2/2) process_DU % % % Whole program spends 88.51% of time in process_DU , only 5.32% of time is spent in itself (run-length coding and Huffman coding) fdct_and_quantization % % DCT and quantization occupies = 78.33%!! of execution time, and 62.41% of time consumed by these function: _fmul ,_fflt , _fsub , _fadd , _f2d, _dfix, _dadd

7 Improving Performance
The original software uses float point DCT and quantization: use hardware DCT unit use fixed point quantization The ARM platform SSRAM is not enough for handling bigger picture: use 128MB DRAM

8 Memory Management

9 Memory Management (cont.)
Two functions are needed: heapalloc() and heapfree() to replace malloc() and free(), here we use dummy memory allocation. heapalloc(size) : if malloc(size) fail in allocating memory, use SDRAM instead, use a pointer to record it. heapfree(ptr) if ‘ptr’ is in the range of SDRAM, do nothing, otherwise, call original ‘free(ptr)’ to release it.

10 HW/SW Partitioning First Step : 1.RGB -> YUV, done in Lab9.
2.DCT module. Second Step: 3.Quantization. 4.Entropy coding. Third Step: 5. Pipelineing.

11 Result of First Step DCT module is 9 bits only, soon it got saturated.

12 DCT (Myip.v Modified) //AHB OUTPUT DRIVERS // second read cycle
`ST_READ : if (Valid == 1'b1)begin case(HADDR[6:0]) 7'b : i_out<=out0; 7'b : i_out<=out1; 7'b : i_out<=out2; 7'b : i_out<=out3; 7'b : i_out<=out4; 7'b : i_out<=out5; 7'b : i_out<=out6; 7'b : i_out<=out7; 7'b : i_out<=rr; endcase //AHB OUTPUT DRIVERS dct dct1(y0,y1,y2,y3,y4,y5,y6,y7,x0,x1,x2,x3,x4,x5,x6,x7); HCLK) //why begin if (HWRITE==1'b1) ………………

13 DCT (Myip.v Modified) //AHB OUTPUT DRIVERS // second write cycle
`ST_WRITE : if (Valid == 1'b1)begin case(HADDR[6:0]) 7'b : i_in0<=HWDATA[31:0]; 7'b : i_in1<=HWDATA[31:0]; 7'b : i_in2<=HWDATA[31:0]; 7'b : i_in3<=HWDATA[31:0]; 7'b : i_in4<=HWDATA[31:0]; 7'b : i_in5<=HWDATA[31:0]; 7'b : i_in6<=HWDATA[31:0]; 7'b : i_in7<=HWDATA[31:0]; 7'b : enable<=HWDATA[31:0]; endcase //AHB OUTPUT DRIVERS dct dct1(y0,y1,y2,y3,y4,y5,y6,y7,x0,x1,x2,x3,x4,x5,x6,x7); HCLK) //why begin if (HWRITE==1'b1) ………………

14 Read & Write Address Write_head 0xcc000000 0xcc000004 0xcc000008
0xcc00000c 0xcc000010 0xcc000014 0xcc000018 0xcc00001c Read_head 0xcc000020 0xcc000024 0xcc000028 0xcc00002c 0xcc000030 0xcc000034 0xcc000038 0xcc00003c

15 driver.cpp modified word_write(ptr,1); ptr=(int *)0xcc000040;
write_head=(int *)0xcc000000; read_head=(int *)0xcc000020; word_write(ptr,0); //write data word_write(0xcc000000,indata[0]); word_write(0xcc000004,indata[1]); word_write(0xcc000008,indata[2]); word_write(0xcc00000c,indata[3]); word_write(0xcc000010,indata[4]); word_write(0xcc000014,indata[5]); word_write(0xcc000018,indata[6]); word_write(0xcc00001c,indata[7]); word_write(ptr,1); pooling(); //read data outdata1[0] = word_read(0xcc000020); outdata1[1] = word_read(0xcc000024); outdata1[2] = word_read(0xcc000028); outdata1[3] = word_read(0xcc00002c); outdata1[4] = word_read(0xcc000030); outdata1[5] = word_read(0xcc000034); outdata1[6] = word_read(0xcc000038); outdata1[7] = word_read(0xcc00003c);

16 Before.cpp modified void JPEG::DCT_1D(int* data0,int* data1,int* data2,int* data3,int* data4,int* data5,int* data6,int* data7) { double tmp0, tmp1, tmp2, tmp3, tmp4,tmp5, tmp6, tmp7; double tmp10, tmp11, tmp12, tmp13; double z1, z2, z3, z4, z5, z11, z13; tmp0 = *data0 + *data7; tmp7 = *data0 - *data7; tmp1 = *data1 + *data6; tmp6 = *data1 - *data6; extern void DCT_1D(int * x0,int * x1,int * x2,int * x3,int * x4,int * x5,int * x6,int * x7);

17 DCT result Original---193KB New---9KB

18 Fixed Point Quantization (1/2)
12 bit for fraction -> satisfactory accuracy float fixed point More than 16 bit fraction -> almost as good as float

19 Fixed Point Quantization (2/2)
Profile: Name cum% self% desc% calls process_DU % % process_DU % % % fdct_and_quantization % % Cycles spent on process_DU: <-> (only improving quantization part with fixed point improves 18.9%)

20 Conclusion Merge the results of IP1 : RGB->YUV, IP2 : DCT and fix point quantization (software). Future works : Second step and third step. Problems : 1. SDRAM is slow. 2. Reading/writing file(s) are slow through MULTI-ICE JTAG interface. Ideal : 0.03 sec per frame -> 30 fps. Capable of encoding motion JPEG sequences in real-time.


Download ppt "Lab 10 : JPEG Encoder Team #7 P 李彥勳 P 謝嵩淮"

Similar presentations


Ads by Google