Download presentation
Presentation is loading. Please wait.
1
Lab 10 : JPEG Encoder Team #7 P91922003 李彥勳 P90922016 謝嵩淮
2
JPEG Encoder Block Diagram
3
JPEG Encoding Flow RGB->YUV DCT (Discrete cosine transformation)
Quantization Zig-Zag scan Entropy coding (run length + Huffman encoding)
4
JPEG in Pure Software It works fine on ARM platform, only the “memory managemnet” codes needs to be modified, so that we can handle large bitmaps. (1024x768) Before : 1024x768.bmp (2,359,350 bytes) After : 1024x768.jpg (158,697 bytes)
5
Profiling Result (1/2) Name cum% self% desc% calls
process_DU % % process_DU % % % fdct_and_quantization % % writebits % % fdct_and_quantization % % % _fmul % % _fflt % % _fsub % % _fadd % % _f2d % % _dfix % % _dadd % %
6
Profiling Result (2/2) process_DU % % % Whole program spends 88.51% of time in process_DU , only 5.32% of time is spent in itself (run-length coding and Huffman coding) fdct_and_quantization % % DCT and quantization occupies = 78.33%!! of execution time, and 62.41% of time consumed by these function: _fmul ,_fflt , _fsub , _fadd , _f2d, _dfix, _dadd
7
Improving Performance
The original software uses float point DCT and quantization: use hardware DCT unit use fixed point quantization The ARM platform SSRAM is not enough for handling bigger picture: use 128MB DRAM
8
Memory Management
9
Memory Management (cont.)
Two functions are needed: heapalloc() and heapfree() to replace malloc() and free(), here we use dummy memory allocation. heapalloc(size) : if malloc(size) fail in allocating memory, use SDRAM instead, use a pointer to record it. heapfree(ptr) if ‘ptr’ is in the range of SDRAM, do nothing, otherwise, call original ‘free(ptr)’ to release it.
10
HW/SW Partitioning First Step : 1.RGB -> YUV, done in Lab9.
2.DCT module. Second Step: 3.Quantization. 4.Entropy coding. Third Step: 5. Pipelineing.
11
Result of First Step DCT module is 9 bits only, soon it got saturated.
12
DCT (Myip.v Modified) //AHB OUTPUT DRIVERS // second read cycle
`ST_READ : if (Valid == 1'b1)begin case(HADDR[6:0]) 7'b : i_out<=out0; 7'b : i_out<=out1; 7'b : i_out<=out2; 7'b : i_out<=out3; 7'b : i_out<=out4; 7'b : i_out<=out5; 7'b : i_out<=out6; 7'b : i_out<=out7; 7'b : i_out<=rr; endcase //AHB OUTPUT DRIVERS dct dct1(y0,y1,y2,y3,y4,y5,y6,y7,x0,x1,x2,x3,x4,x5,x6,x7); HCLK) //why begin if (HWRITE==1'b1) ………………
13
DCT (Myip.v Modified) //AHB OUTPUT DRIVERS // second write cycle
`ST_WRITE : if (Valid == 1'b1)begin case(HADDR[6:0]) 7'b : i_in0<=HWDATA[31:0]; 7'b : i_in1<=HWDATA[31:0]; 7'b : i_in2<=HWDATA[31:0]; 7'b : i_in3<=HWDATA[31:0]; 7'b : i_in4<=HWDATA[31:0]; 7'b : i_in5<=HWDATA[31:0]; 7'b : i_in6<=HWDATA[31:0]; 7'b : i_in7<=HWDATA[31:0]; 7'b : enable<=HWDATA[31:0]; endcase //AHB OUTPUT DRIVERS dct dct1(y0,y1,y2,y3,y4,y5,y6,y7,x0,x1,x2,x3,x4,x5,x6,x7); HCLK) //why begin if (HWRITE==1'b1) ………………
14
Read & Write Address Write_head 0xcc000000 0xcc000004 0xcc000008
0xcc00000c 0xcc000010 0xcc000014 0xcc000018 0xcc00001c Read_head 0xcc000020 0xcc000024 0xcc000028 0xcc00002c 0xcc000030 0xcc000034 0xcc000038 0xcc00003c
15
driver.cpp modified word_write(ptr,1); ptr=(int *)0xcc000040;
write_head=(int *)0xcc000000; read_head=(int *)0xcc000020; word_write(ptr,0); //write data word_write(0xcc000000,indata[0]); word_write(0xcc000004,indata[1]); word_write(0xcc000008,indata[2]); word_write(0xcc00000c,indata[3]); word_write(0xcc000010,indata[4]); word_write(0xcc000014,indata[5]); word_write(0xcc000018,indata[6]); word_write(0xcc00001c,indata[7]); word_write(ptr,1); pooling(); //read data outdata1[0] = word_read(0xcc000020); outdata1[1] = word_read(0xcc000024); outdata1[2] = word_read(0xcc000028); outdata1[3] = word_read(0xcc00002c); outdata1[4] = word_read(0xcc000030); outdata1[5] = word_read(0xcc000034); outdata1[6] = word_read(0xcc000038); outdata1[7] = word_read(0xcc00003c);
16
Before.cpp modified void JPEG::DCT_1D(int* data0,int* data1,int* data2,int* data3,int* data4,int* data5,int* data6,int* data7) { double tmp0, tmp1, tmp2, tmp3, tmp4,tmp5, tmp6, tmp7; double tmp10, tmp11, tmp12, tmp13; double z1, z2, z3, z4, z5, z11, z13; tmp0 = *data0 + *data7; tmp7 = *data0 - *data7; tmp1 = *data1 + *data6; tmp6 = *data1 - *data6; extern void DCT_1D(int * x0,int * x1,int * x2,int * x3,int * x4,int * x5,int * x6,int * x7);
17
DCT result Original---193KB New---9KB
18
Fixed Point Quantization (1/2)
12 bit for fraction -> satisfactory accuracy float fixed point More than 16 bit fraction -> almost as good as float
19
Fixed Point Quantization (2/2)
Profile: Name cum% self% desc% calls process_DU % % process_DU % % % fdct_and_quantization % % Cycles spent on process_DU: <-> (only improving quantization part with fixed point improves 18.9%)
20
Conclusion Merge the results of IP1 : RGB->YUV, IP2 : DCT and fix point quantization (software). Future works : Second step and third step. Problems : 1. SDRAM is slow. 2. Reading/writing file(s) are slow through MULTI-ICE JTAG interface. Ideal : 0.03 sec per frame -> 30 fps. Capable of encoding motion JPEG sequences in real-time.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.