Nayana Parashar Multimedia Processing Lab

Slides:



Advertisements
Similar presentations
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advertisements

Time Optimization of HEVC Encoder over X86 Processors using SIMD
MPEG4 Natural Video Coding Functionalities: –Coding of arbitrary shaped objects –Efficient compression of video and images over wide range of bit rates.
MULTIMEDIA PROCESSING STUDY AND IMPLEMENTATION OF POPULAR PARALLELING TECHNIQUES APPLIED TO HEVC Under the guidance of Dr. K. R. Rao By: Karthik Suresh.
Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
MULTIMEDIA PROCESSING
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
Efficient Bit Allocation and CTU level Rate Control for HEVC Picture Coding Symposium, 2013, IEEE Junjun Si, Siwei Ma, Wen Gao Insitute of Digital Media,
SWE 423: Multimedia Systems
{ Fast Disparity Estimation Using Spatio- temporal Correlation of Disparity Field for Multiview Video Coding Wei Zhu, Xiang Tian, Fan Zhou and Yaowu Chen.
Light Field Compression Using 2-D Warping and Block Matching Shinjini Kundu Anand Kamat Tarcar EE398A Final Project 1 EE398A - Compression of Light Fields.
A Fast and Efficient Multi-View Depth Image Coding Method Based on Temporal and Inter- View Correlations of Texture Images Jin Yong Lee Ho Chen Wey Du.
SWE 423: Multimedia Systems Chapter 7: Data Compression (1)
Overview of Multi-view Video Coding Yo-Sung Ho; Kwan-Jung Oh; Systems, Signals and Image Processing, 2007 and 6th EURASIP Conference focused on Speech.
BY AMRUTA KULKARNI STUDENT ID : UNDER SUPERVISION OF DR. K.R. RAO Complexity Reduction Algorithm for Intra Mode Selection in H.264/AVC Video.
Xinqiao LiuRate constrained conditional replenishment1 Rate-Constrained Conditional Replenishment with Adaptive Change Detection Xinqiao Liu December 8,
3D EXTENSION of HEVC: Multi-View plus Depth Parashar Nayana Karunakar Student Id: Department of Electrical Engineering.
3D EXTENSION of HEVC: Multi-View plus Depth Parashar Nayana Karunakar Student Id: Department of Electrical Engineering.
3D Stereo Video Coding Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2013 Last updated
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
By Sudeep Gangavati ID EE5359 Spring 2012, UT Arlington
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
MPEG-2 Standard By Rigoberto Fernandez. MPEG Standards MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (International Standards.
Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.
PROJECT INTERIM REPORT HEVC DEBLOCKING FILTER AND ITS IMPLEMENTATION RAKESH SAI SRIRAMBHATLA UTA ID:
Profiles and levelstMyn1 Profiles and levels MPEG-2 is intended to be generic, supporting a diverse range of applications Different algorithmic elements.
Comparative study of various still image coding techniques. Harish Bhandiwad EE5359 Multimedia Processing.
Windows Media Video 9 Tarun Bhatia Multimedia Processing Lab University Of Texas at Arlington 11/05/04.
Advanced Computer Technology II FTV and 3DV KyungHee Univ. Master Course Kim Kyung Yong 10/10/2015.
IMPLEMENTATION AND PERFOMANCE ANALYSIS OF H.264 INTRA FRAME CODING, JPEG, JPEG-LS, JPEG-2000 AND JPEG-XR 1 EE 5359 Multimedia Project Amee Solanki ( )
MULTIMEDIA PROCESSING (EE 5359) SPRING 2011 DR. K. R. RAO PROJECT PROPOSAL Error concealment techniques in H.264 video transmission over wireless networks.
Sadaf Ahamed G/4G Cellular Telephony Figure 1.Typical situation on 3G/4G cellular telephony [8]
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
Image Compression Supervised By: Mr.Nael Alian Student: Anwaar Ahmed Abu-AlQomboz ID: IT College “Multimedia”
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
Video Compression Standards for High Definition Video : A Comparative Study Of H.264, Dirac pro And AVS P2 By Sudeep Gangavati EE5359 Spring 2012, UT Arlington.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
Sub pixel motion estimation for Wyner-Ziv side information generation Subrahmanya M V (Under the guidance of Dr. Rao and Dr.Jin-soo Kim)
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Figure 1.a AVS China encoder [3] Video Bit stream.
PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT By Vidur Vajani ( ) Under the guidance of Dr.
IMPLEMENTATION OF H.264/AVC, AVS China Part 7 and Dirac VIDEO CODING STANDARDS Under the guidance of Dr. K R. Rao Electrical Engineering Department The.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Study and Optimization of the Deblocking Filter in H.265 and its Advantages over H.264 By: Valay Shah Under the guidance of: Dr. K. R. Rao.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
EE 5359 Multimedia Project -Shreyanka Subbarayappa
Porting of Fast Intra Prediction in HM7.0 to HM9.2
Transcoding from H.264/AVC to HEVC
Video Compression—From Concepts to the H.264/AVC Standard
COMPARATIVE STUDY OF HEVC and H.264 INTRA FRAME CODING AND JPEG2000 BY Under the Guidance of Harshdeep Brahmasury Jain Dr. K. R. RAO ID MS Electrical.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
By: Santosh Kumar Muniyappa ( ) Guided by: Dr. K. R. Rao Final Report Multimedia Processing (EE 5359)
VLSI Design of View Synthesis for 3DVC/FTV Jongwoo Bae' and Jinsoo Cho 2, 1 Department of Information and Communication Engineering, Myongji University.
Digital Video Representation Subject : Audio And Video Systems Name : Makwana Gaurav Er no.: : Class : Electronics & Communication.
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
Project Proposal Error concealment techniques in H.264 Under the guidance of Dr. K.R. Rao By Moiz Mustafa Zaveri ( )
EE 5359 MULTIMEDIA PROCESSING PROJECT PROPOSAL SPRING 2016 STUDY AND PERFORMANCE ANALYSIS OF HEVC, H.264/AVC AND DIRAC By ASHRITA MANDALAPU
Presenting: Shlomo Ben-Shoshan, Nir Straze Supervisors: Dr. Ofer Hadar, Dr. Evgeny Kaminsky.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission Vineeth Shetty Kolkeri EE Graduate,UTA.
Study and Optimization of the Deblocking Filter in H
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Comparative study of various still image coding techniques.
MPEG4 Natural Video Coding
MPEG-Immersive 3DoF+ Standard Work:
A Comparative Study of Depth Map Coding Schemes for 3D Video
Scalable light field coding using weighted binary images
Presentation transcript:

IMPLEMENTATION OF AN OUT-OF-THE LOOP POST-PROCESSING TECHNIQUE FOR HEVC DECODED DEPTH-MAPS Nayana Parashar Multimedia Processing Lab University of Texas at Arlington Supervising Professor: Dr. K.R. Rao November 25th, 2013 Multimedia Processing Lab, UTA 11/25/2013

CONTENTS BASIC CONCEPTS VIDEO COMPRESSION 3D VIDEO COMPRESSION THESIS-WORK RESULTS CONCLUSIONS FUTURE-WORK REFERENCES Multimedia Processing Lab, UTA 11/25/2013

THESIS IN A NUT-SHELL Normal Procedure Thesis Motivation : Compression artifact removal, better perceptual quality of rendered frames. 3D VIDEO ENCODING (Color-sequence & corresponding Depth-map ) 3D VIDEO DECODING VIEW RENDERING for DISPLAY (Stereoscopic or Multi-view) 3D VIDEO ENCODING (Color-sequence & corresponding Depth-map ) 3D VIDEO DECODING VIEW RENDERING for DISPLAY (Stereoscopic or Multi-view) Post-processing of the decoded depth-map Multimedia Processing Lab, UTA 11/25/2013

BASIC CONCEPTS Multimedia Processing Lab, UTA 11/25/2013

Image and video Images and video make-up the visual media. An image is characterized by pixels or pels, the smallest addressable elements in a display device. Properties of an image: number of pixels (height and width), color and brightness of each pixel. Video is composed of a sequence of pictures (frames) taken at regular time (temporal) intervals. Figure 1: 2D image with spatial samples (L) and Video with N frames (R) [1] Multimedia Processing Lab, UTA 11/25/2013

3D video – Multi-view video plus depth format The multi-view video plus depth (MVD) [2] [3] format: The most promising format for enhanced 3D visual experiences. This type of representation provides, for each view-point, a texture (image sequence) and an associated depth-map sequence (fig. 2). Figure 2: Color video frame (L) and associated depth map frame (R) [4] Multimedia Processing Lab, UTA 11/25/2013

Depth-maps Depth maps represent the per-pixel depth of a corresponding color image, and signal the disparity information needed at the virtual (novel) view rendering system. They are represented as a gray-scale image sequence for storage and transmission requirements. In the depth maps, each pixel conveys information on the relative distance from the camera to the object in the 3D space. Their efficient compression and transmission to the decoder is important for view generation. They are never actually displayed and are used for view generation purposes only. Multimedia Processing Lab, UTA 11/25/2013

Depth Image Based Rendering (DIBR) [5] It is the process of synthesizing “virtual” views of a scene from still or moving images and associated per-pixel depth information. Two step process: The original image points are reprojected into the 3D world, utilizing the respective depth data. 3D space points are projected into the image plane of a “virtual” camera, which is located at the required viewing position. Stereoscopic view generation:- Two (left and right) views are generated. Multiple view generation:- More than two views (each view corresponding to the scene viewed from a different angle) are generated. Multimedia Processing Lab, UTA 11/25/2013

Stereoscopic view rendering A color image and per-pixel depth map can be used to generate virtual stereoscopic views. This is shown in fig. 3. In this process, the original image points at locations (x, y) are transferred to new locations (xL , y) and (xr , y) for left and right view respectively. The view generation process in a little detail: Figure 3: Virtual view generation in Depth Image Based Rendering (DIBR) process [6] Multimedia Processing Lab, UTA 11/25/2013

VIDEO COMPRESSION Multimedia Processing Lab, UTA 11/25/2013

Introduction Data compression: Science of representing information in a compact format. Common image/video compression techniques reduce the number of bits required to represent image/video sequence (can be lossy or lossless). Video compression strategies:- Spatial, temporal and bit-stream redundancies are exploited. High-frequency components are removed. Many organizations have come-up with a number of video compression codecs over the past many years. [1] High Efficiency Video Coding (HEVC) is the most recent video compression standard. Multimedia Processing Lab, UTA 11/25/2013

HEVC overview [13][14] Successor of the H.264/AVC video compression standard. Multiple goals: improved coding efficiency, ease of transport system integration data loss resilience implementation ability using parallel processing architectures Complexity of some key modules such as transforms, intra prediction, and motion compensation is higher in HEVC than in H.264/AVC. Complexity of modules such as entropy coding and deblocking is lower in HEVC than in H.264/AVC [15]. Multimedia Processing Lab, UTA 11/25/2013

HEVC encoder- Block diagram LEGEND: - High freq. content removal - Spatial redundancy exploitation - Temporal redundancy exploitation - Bit-stream redundancy exploitation -Sharp edge smoothing Figure 4: HEVC encoder block-diagram [13] Multimedia Processing Lab, UTA 11/25/2013

3D VIDEO COMPRESSION Multimedia Processing Lab, UTA 11/25/2013

The depth-map dilemma Compression of depth-maps is a challenge. Quantization process eliminates high spatial frequencies in individual frames. The compression artifacts have adverse consequences upon the quality of the rendered views. It is highly important to preserve the sharp depth discontinuities present in depth maps for high quality virtual view generation. Two solutions exist to this dilemma Multimedia Processing Lab, UTA 11/25/2013

The two approaches for 3D compression Approach one: Use of novel video compression techniques that are suitable for 3D video. Special features are added to overcome the depth-map dilemma. E.g. 3D video coding in H.264/AVC [16], 3D video extension of HEVC [17] [18] [19] Advantages: Special features that are specific to 3D video are exploited (Inter-view prediction), Dedicated blocks for depth-map compression in the codec. Disadvantages: Insanely complex with respect to general codec structure as well encoding time. Approach two: Using the already existing codecs to encode and decode the sequences.. Later, use image denoising techniques [20] on decoded depth-maps to remove compression artifacts. Advantages: Not as complicated and complex as approach one. Use of existing video codecs without any modification. Disadvantages: There is never one right denoising solution. Multimedia Processing Lab, UTA 11/25/2013

THESIS-WORK Multimedia Processing Lab, UTA 11/25/2013

Scope and premises This thesis falls into the second approach explained for 3D video compression Not much research has been done to implement image denoising techniques for HEVC decoded depth-maps. A post-processing framework that is based on analysis of compression artifacts upon generation of virtual views is used. The post-processing frame-work utilizes a spatial filtering technique specifically discontinuity analysis followed by Edge-adaptive joint trilateral filter [6] to reduce compression artifacts. Effectively reduces the compression artifacts from HEVC decoded depth-maps. Improvement in the perceptual quality of rendered views without using depth-map specific video codec Multimedia Processing Lab, UTA 11/25/2013

Algorithm: Block diagram Encoder/Decoder Depth Discontinuity Analysis Edge Adaptive Joint Trilateral Filter Original Depth Map Corresponding Color Image Compressed Binary Mask Reconstructed Depth Map (a) (b) Figure 5: Block-diagram of the algorithm used for depth-map enhancement Multimedia Processing Lab, UTA 11/25/2013

Step (a): Depth discontinuity analysis [6] The purpose is twofold: 1) The areas that have aligned edges in the color image and the corresponding depth map are identified. The filter kernels of the EA-JTF are adaptively selected based on this information. 2) All depth discontinuities that are significant in terms of rendering are identified. Sub-steps: The depth map is convolved with a vertical sobel filter to obtain Gx. An edge mask Ed is derived using Eq. (1.1), which corresponds to pixel locations of significant depth discontinuities. 𝐸 𝑑(𝑝,𝑞) = 1 𝑖𝑓 𝐺 𝑥 (𝑝,𝑞) ≥∆ 𝑚 𝑚𝑎𝑥 0 𝑖𝑓 𝐺 𝑥 (𝑝,𝑞) ≤∆ 𝑚 𝑚𝑎𝑥 Where ∆ 𝑚 𝑚𝑎𝑥 is a theoretical threshold obtained after studying the effects of compression artifacts to view rendering. ∆ 𝑚 𝑚𝑎𝑥 = 2.𝐷.255 𝑥 𝐵 . 𝑁 𝑝𝑖𝑥 .( 𝑘 𝑛𝑒𝑎𝑟 + 𝑘 𝑓𝑎𝑟 ) xB – distance between the left and right virtual cameras or eye separation (assumed to be 6 cm) D - viewing distance (assumed to be 250 cm). knear and kfar – range of the depth information respectively behind and in front of the picture, relative to the screen width. Npix – screen width measured in pixels 8-bit images are considered ( that is why the number 255) (1.1) Multimedia Processing Lab, UTA 11/25/2013

Step (a): (contd.) To identify the regions in which the color edges and depth discontinuities are aligned, an edge mask Ec of the color image is generated by the canny edge detection algorithm. Using Ed and Ec, the binary mask Es signifying the aligned edge areas is obtained as: 𝐄 𝐬 = 𝐄 𝐝 ⊕ 𝐒 𝟏 ∩ 𝐄 𝐜 ⊕ 𝐒 𝟐 Where, ⨁ represents the morphological dilation and S1and S2 represent flat square structuring elements of size 2 and 7 respectively. Different stages of step (a) are shown in figure 6. (1.2) Multimedia Processing Lab, UTA 11/25/2013

Figure 6: Illustration of depth discontinuity analysis Multimedia Processing Lab, UTA 11/25/2013

Step (b): Edge-adaptive joint trilateral filter The edge adaptive joint trilateral filter [6] is based of bilateral filter and joint trilateral filter [7] [8] [9] [10] [11] [12]. For some pixel position p the filtered result F is given as in the eq. (2.1), 𝑭= 𝐪∈𝛀 𝐖 𝐩𝐪 − 𝐈 𝐪 𝐪∈𝛀 𝐖 𝐩𝐪 In Eq. (2.1), Iq is the value at pixel position q in the kernel neighborhood. The filter weight wpq at pixel position q is calculated as, 𝐖 𝐩𝐪 =𝐜 𝐩,𝐪 . 𝐬𝒕(𝐩,𝐪) Both c and s are popularly implemented as a Gaussian centered at p and Ip (Ip is the value at pixel position p) with standard deviations σc and σs, respectively as 𝐜 𝐩,𝐪 =𝐞𝐱𝐩 − 𝟏 𝟐 𝐩−𝐪 𝟐 / 𝛔 𝐜 𝟐 𝐬 𝐩,𝐪 =𝐞𝐱𝐩 − 𝟏 𝟐 𝐈 𝐩 − 𝐈 𝐪 𝟐 / 𝛔 𝐬 𝟐 The similarity filter kernel st of the joint trilateral filter is adaptively selected as given in Eq. (2.3). For the areas where the edges between the color image and the corresponding depth map are aligned (i.e. Es from eq. 1.2 = 1) , there will be two similarity filter kernels used, each derived from the compressed depth map (s) and the color image(sj). For the remaining area, only the similarity filter kernel derived from the compressed depth map is used. 𝐬 𝐭 (𝐩,𝐪)= 𝐬 𝐩,𝐪 . 𝐬 𝐣 𝐩,𝐪 𝐢𝐟 𝐄 𝐬 𝐩,𝐪 =𝟏 𝐬 𝐩,𝐪 𝐢𝐟 𝐄 𝐬 𝐩,𝐪 =𝟎 (2.1) (2.2) (2.3) (2.4) (2.5) Multimedia Processing Lab, UTA 11/25/2013

Step (c) :- Stereoscopic view rendering The reconstructed depth-map from step (b) is used to generate left side and right side views using stereoscopic view rendering process [21] [22] [27]. Finally, the frames obtained using uncompressed depth map, HEVC decoded depth-map and HEVC decoded depth-map to which the post-processing has been applied are compared using the metrics PSNR, SSIM [24] and a approximate of Mean Opinion Score [25] for image quality. Multimedia Processing Lab, UTA 11/25/2013

RESULTS Multimedia Processing Lab, UTA 11/25/2013

Results: Experimental set-up To evaluate the performance of the EA-JTF [6] on HEVC decoded depth maps, color sequences along with the corresponding depth maps are compressed using HEVC reference software HM 9.2 [26]. For the purpose of filtering and rendering MATLAB R2013a student version was used. For all the sequences, other than Ballet, only one frame result is obtained at QP = 32. A 15 frame sequence at a frame-rate of 3 frames/sec is used for Ballet. Three different rendered images are obtained: 1) Original image and the corresponding depth map are used. (original) 2) HEVC decoded image and the corresponding decoded depth-map are used. (compressed) 3) HEVC decoded image and the depth-map after post-processing. (post-processed) PSNR and SSIM [24] and an approximate Mean Opinion Score (MOS) [25] was used to evaluate the perceptual quality of the rendered views. Multimedia Processing Lab, UTA 11/25/2013

Results: Input parameters Value Viewing distance (D) 250cm (assumed) Eye separation (xB) 6cm(assumed) Screen width in pixels (Npix) 1366 (for the laptop used for experimentation) knear and kfar knear = 44.00; kfar = 120.00 (BreakDancer) knear = 42.00; kfar = 130.00 (Ballet) knear = 448.25; kfar = 11206.28 (Balloons) knear = 448.25 ; kfar = 11206.28 (Kendo) Resolution of the video sequences used 1024 x 768   EA-JTF Kernel size: 15 x 15 pixels Standard deviation for the color similarity filter (σs) = 0.025 (normalized range of 0-1) Standard deviation for the depth similarity filter (σj) = 0.036 (normalized range of 0-1) Standard deviation for the closeness filter (σc) = 45 Multimedia Processing Lab, UTA 11/25/2013

Results: Break-dancer sequence Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using computer vision algorithms. A grayscale version of the sequence was used for approximate MOS calculation. Even, here the post-processed method had better ratings than the compressed one. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 41.9401 41.9804 SSIM (dB) 0.9133 0.9139 Image MOS Rating (max = 3) Original 2.6 Decoded 1.5 Processed 1.9 Multimedia Processing Lab, UTA 11/25/2013

Results: Ballet sequence Original sequence obtained from Microsoft Research [23] An increase in both PSNR as well as SSIM is seen. High-quality rendering as the original depth-maps are generated using computer vision algorithms. Sequence not used for MOS calculation. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 42.7317 42.787 SSIM (dB) 0.9413 0.9444 Multimedia Processing Lab, UTA 11/25/2013

Results: Kendo sequence Original sequence obtained from [4]. Very interesting sequence. Not much edge information, hence the original, post-processed and compressed all are extremely similar perceptually. However, there is a slight decrease in PSNR and SSIM turned out to be exactly equal. On the other hand, in MOS calculation, the post-processed frame performed better than the compressed frame. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 45.7213 45.0551 SSIM (dB) 0.9887 0.9877 Image MOS Rating (max = 3) Original 2.2 Decoded 1.7 Processed 2.1 Multimedia Processing Lab, UTA 11/25/2013

Results: Balloons sequence Original sequence obtained from [4]. The compressed has better PSNR as well as SSIM compared to the processed. This can be attributed to the fact that the views rendered from the original sequence themselves are not optimal due to noise in the original-depth. The proposed solution improves the perceptual quality to a great extent. In MOS calculation, the post-processed frame performed better than the compressed frame. Image database Metric Decoded Image(Left-side view) Processed Image(Left-side view) Decoded Image(Right-side view) Processed Image(Right-side view) PSNR (dB) 44.2039 43.209 SSIM (dB) 0.981 0.9798 Image MOS Rating (max = 3) Original 2.4 Decoded 1.0 Processed 2.5 Multimedia Processing Lab, UTA 11/25/2013

CONCLUSIONS Multimedia Processing Lab, UTA 11/25/2013

Conclusions Quality of rendered views (stereoscopic rendering) generated using HEVC decoded depth-maps was improved. Four multi-view plus depth sequences were used to carry-out experiments. There was a an improvement in PSNR as well as SSIM for the two sequences- Break-dancer and Ballet. Break-dancer sequence saw an improvement of 0.04 dB in PSNR and 0.006 dB in SSIM. Ballet saw improvement of dB in PSNR and dB in SSIM. There was no improvement in PSNR for Kendo sequence while the SSIM remained constant (not much edge information) while for the balloons sequence, there was no improvement in either the PSNR or the SSIM. However, the main improvement brought about by this method was the improvement in the perceptual quality of the rendered views. An approximate MOS test survey suggested that the views rendered after post-processing were always better perceptually compared to the ones rendered without post-processing. In this regard, all the four test sequences showed improvement in perceptual quality. Multimedia Processing Lab, UTA 11/25/2013

FUTURE-WORK Multimedia Processing Lab, UTA 11/25/2013

Future-work Improvement in filter design to provide more significant results. Moving ahead of stereoscopic rendering and into multi-view rendering. Method can be made in-loop and merged with the HEVC compression codec. To calculate the perceptual quality, the current work used SSIM and an approximate of Mean Opinion Score, more research into perceptual quality assessment for depth-maps and rendered views will be useful. Multimedia Processing Lab, UTA 11/25/2013

IMAGE DATABASE Multimedia Processing Lab, UTA 11/25/2013

Break-Dancer sequence Multimedia Processing Lab, UTA 11/25/2013

Break-dancer sequence-Grayscale (used for MOS) Multimedia Processing Lab, UTA 11/25/2013

Ballet sequence Multimedia Processing Lab, UTA 11/25/2013

Balloons sequence Multimedia Processing Lab, UTA 11/25/2013

Balloons –grayscale (used for MOS) Multimedia Processing Lab, UTA 11/25/2013

Kendo sequence Multimedia Processing Lab, UTA 11/25/2013

Kendo sequence grayscale (used for MOS) Multimedia Processing Lab, UTA 11/25/2013

REFERENCES Multimedia Processing Lab, UTA 11/25/2013

References Multimedia Processing Lab, UTA 11/25/2013 K.R. Rao, D.N. Kim and J.J. Hwang, “Video coding standards: AVS China, H.264/MPEG4-Part 10, HEVC, VP6, DIRAC and VC-1”, Springer - 2014. D.K. Shah, et al, "Evaluating multi-view plus depth coding solutions for 3D video scenarios," 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON), 2012, vol., no., pp.1, 4, 15-17 Oct. 2012. Fraunhofer HHI, 3D Video coding information: http://www.hhi.fraunhofer.de/fields-of-competence/image-processing/research- groups/image-video-coding/3d-hevc-extension.html Balloons and Kendo test sequences: http://www.tanimoto.nuee.nagoya-u.ac.jp/~fukushima/mpegftv/ C. Fehn "A 3D-TV system based on video plus depth information," Signals, Systems and Computers, 2004. Conference Record of the Thirty-Seventh Asilomar Conference on, vol.2, no., pp.1529-1533 Vol.2, 9-12 Nov. 2003. D.V.S. De Silva, et al, “A Depth Map Post-Processing Framework for 3D-TV systems based on Compression Artifact Analysis”, Selected Topics in Signal Processing, 2011, IEEE journal of volume: PP, Issue: 99, pp. 1 - 30 C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” IEEE International Conference on Computer Vision, Washington DC, USA, pp 839-846, 1998. E. Eisemann and F. Durand, “Flash photography enhancement via intrinsic relighting,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3. ACM, 2004, pp. 673–678. G. Petschnigg, et al, “Digital photography with flash and no-flash image pairs,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3. ACM, 2004, pp. 664–672. B. Zhang and J. Allebach, “Adaptive bilateral filter for sharpness enhancement and noise removal,” Image Processing, IEEE Transactions on, vol. 17, no. 5, pp. 664–678, 2008. P. Choudhury and J. Tumblin, “The trilateral filter for high contrast images and meshes,” in ACM SIGGRAPH 2005 Courses. ACM, 2005, pp. 5-es. S. Liu, P. Lai, D. Tian, C. Gomila, and C. W. Chen, “Joint trilateral filtering for depth map compression,” Huangshan, China, 2010, pp. 77 440F-10. G.J. Sullivan; J. Ohm; Woo-Jin Han and T.Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, Dec 2012. HEVC text specification draft 10: http://phenix.it- sudparis.eu/jct/doc_end_user/current_document.php?id=7243 Multimedia Processing Lab, UTA 11/25/2013

REFERENCES Multimedia Processing Lab, UTA 11/25/2013 F Bossen, et al, “HEVC complexity and implementation analysis”, IEEE Transactions on Circuits and Systems for Video Technology, Volume: 22, Issue: 12, pp. 1685 - 1696, December 2012. 3DV for H.264: http://mpeg.chiariglione.org/technologies/general/mp-3dv/index.htm Fraunhofer HHI, 3D Video coding information: http://www.hhi.fraunhofer.de/fields-of-competence/image- processing/research-groups/image-video-coding/3d-hevc-extension.html P. Merkle, A Smolic, K. Müller, and T. Wiegand, “Multi-View video plus depth data representation and coding”. Picture Coding Symposium, 2007. “Test Model under Consideration for HEVC based 3D video coding”, ISO/IEC JTC1/SC29/WG11 MPEG2011/N12559, San Jose, CA, USA, February 2012. M.C. Motwani, et al, “A survey of image denoising techniques”, Proceedings of GSPx 2004, Santa Clara, CA: http://www.cse.unr.edu/~fredh/papers/conf/034-asoidt/paper.pdf I. J. S. W. 11, “Proposed experimental conditions for EE4 in MPEG 3DAV. WG 11 doc. m9016,” vol. Shanghai, Oct. 2002. C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV,” Proceedings of the SPIE, vol. 5291, 93, 2004. Break-Dancers and Ballet sequence: http://research.microsoft.com/en-us/um/people/sbkang/3dvideodownload/ Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600 - 612, Apr. 2004. L Ma, et al, "Image Retargeting Quality Assessment: A study of subjective scores and objective metrics," Selected Topics in Signal Processing, IEEE Journal of , vol.6, no.6, pp.626,639, Oct. 2012. HEVC reference software (HM 9.2):- https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/branches/HM-9.2-dev/ MATLAB code for stereoscopic view rendering: http://www.mathworks.com/matlabcentral/fileexchange/27538- depth-image-based-stereoscopic-view-rendering Multimedia Processing Lab, UTA 11/25/2013

THANK YOU! QUESTIONS?? Multimedia Processing Lab, UTA 11/25/2013

The lighter gray regions represent near objects. The darker gray regions represent far objects. Multimedia Processing Lab, UTA 11/25/2013

EQUATIONS FOR STEREOSCOPIC VIEW GENERATION The original image points at locations (x, y) are transferred to new locations (xL, y) and (xr,y) for left and right view respectively. This process is defined with: 𝐱 𝐑 =𝐱+ 𝐏 𝐩𝐢𝐱 𝟐 𝐱 𝐋 =𝐱− 𝐏 𝐩𝐢𝐱 𝟐 𝐩 𝐩𝐢𝐱 = − 𝐱 𝐁 𝐍 𝐩𝐢𝐱 𝐃 𝐦 𝟐𝟓𝟓 𝐤 𝐧𝐞𝐚𝐫 + 𝐤 𝐟𝐚𝐫 − 𝐤 𝐟𝐚𝐫 Where, ppix – pixel parallax xB – distance between the left and right virtual cameras or eye separation (assumed to be 6 cm) D - viewing distance (assumed to be 250 cm). m – depth value of each pixel in the reference view knear and kfar – range of the depth information respectively behind and in front of the picture, relative to the screen width. Npix – screen width measured in pixels 8-bit images are considered ( that is why the number 255) (1) (2) (3) Multimedia Processing Lab, UTA 11/25/2013

Stereo triangulation: (contd.) Virtual cameras are selected such that the epipolar lines are horizontal, and thus the y component is constant. The equation (3) is in accordance with MPEG informative recommendation. The dis-occluded regions (visual holes) are filled by background pixel extrapolation technique. Due to any noise with which the depth maps could be corrupted, the luminance values of the pixels would be modified, i.e. m in eq (3) will be modified. This will result in warping error and thus cause distortions in the image rendered with the noisy depth map. Multimedia Processing Lab, UTA 11/25/2013

Epipolar line The line OL–X is seen by the left camera as a point because it is directly in line with that camera's center of projection. The right camera sees this line as a line in its image plane. That line (eR–xR) in the right camera is called an epipolar line. Symmetrically, the line OR–X seen by the right camera as a point is seen as epipolar line eL–xLby the left camera. Any line which intersects with the epipolar point is an epipolar line since it can be derived from some 3D point X. Multimedia Processing Lab, UTA 11/25/2013

Video compression strategies Multimedia Processing Lab, UTA 11/25/2013

The chronology of different video compression standards Multimedia Processing Lab, UTA 11/25/2013

3D video coding in H.264/AVC Multiview Video Coding (MVC):- an amendment to H.264/MPEG-4 AVC video compression standard [3]. Enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream. MVC is intended for encoding stereoscopic (two-view) video, as well as free viewpoint television and multi-view 3D television. MVC stream is backward compatible with H.264/AVC [3], which allows older devices and software to decode stereoscopic video streams, ignoring additional information for the second view [6]. Combined temporal and inter-view prediction is the key for efficient MVC encoding. A frame from a certain camera can be predicted not only from temporally related frames from the same camera, but also from the frames of neighboring cameras. These interdependencies can be used for efficient prediction [6]. Figure: Multi-view coding structure with hierarchical B pictures for both temporal (black arrows) and inter-view prediction(red arrows) Multimedia Processing Lab, UTA 11/25/2013

3D extension of HEVC Multimedia Processing Lab, UTA 11/25/2013

Basic 3D video codec structure Figure: Block Diagram of a 3D Video Codec[4]

MVD codec- working The basic structure of the 3D video codec is shown in the block diagram of Figure 5. In principle, each component signal is coded using an HEVC-based codec. The resulting bit stream packets, or more accurately, the resulting Network Abstraction Layer (NAL) units, are multiplexed to form the 3D video bit stream. The base or independent view is coded using an unmodified HEVC codec. The base view sub-stream can be directly decoded using the conventional HEVC decoder. For coding the dependent views and the depth data, modified HEVC codec are used, which are extended by including additional coding tools and inter-component prediction techniques that employ already coded data inside the same access unit as indicated by the red arrows in Figure 5. For enabling an optional discarding of depth data from the bit stream, e.g., for supporting the decoding of a stereo video suitable for conventional stereo displays, the inter-component prediction can be configured in a way that video pictures can be decoded independently of the depth data..

Figure: Access units structure and coding order of view components[12] MVD- CODING ALGORITHM The video pictures and, when present, the depth maps are coded access unit by access unit, as it is illustrated in Figure 6. An access unit includes all video pictures and depth maps that correspond to the same time instant. NAL units containing camera parameters may be additionally associated with an access unit. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (viewId). All video pictures and depth maps that belong to the same camera position are associated with the same value of viewId. Inside an access unit, the video picture and, when present, the associated depth map with viewId equal to 0 are coded first, followed by the video picture and depth map with viewId equal to 1, etc. For ordering the reconstructed video pictures and depth map after decoding, each value of viewId is associated with another identifier called view order index (VOI). The view order index is a signed integer values, which specifies the ordering of the coded views from left to right. Figure: Access units structure and coding order of view components[12]

COMPARSION – MVD AND HEVC CODEC CODING OF DEPENDENT VIEWS -- Additional tools have been integrated into the HEVC codec, which employ already coded data in other views for efficiently representing a dependent view. These tools include - Disparity-compensated prediction, View synthesis based inter-view prediction, Post processing in-loop filtering, Inter-view motion prediction, Depth-based motion parameter prediction, Inter-view residual prediction, Adjustment of QP of texture based on depth data. CODING OF DEPTH MAPS – There are certain additional tools and also some tools are removed for coding of Depth maps. Some of the differences are -- Depth Maps are coded in 4:0:0 format, Non-linear depth representation is used, Z-near Z-far compensated weighted prediction, Modified motion compensation and motion vector coding ( No interpolation is used i.e. for depth maps, the inter-picture prediction is always performed with full-sample accuracy. Disabling of in-loop filtering ( deblocking filter and SAO), Depth modeling modes ( Four new Intra-prediction modes are used), Motion parameter inheritance.