Perception-motivated High Dynamic Range Video Encoding

Name: Perception-motivated High Dynamic Range Video Encoding
Uploaded: 2017-08-21T04:28:00+00:00
Duration: PTM23S33
Channel: Verity Flowers
Description: Perception-motivated High Dynamic Range Video Encoding

Perception-motivated High Dynamic Range Video Encoding
Rafal Mantiuk, Grzegorz Krawczyk, Karol Myszkowski, Hans-Peter Seidel INFORMATIK

High Dynamic Range The human eye can see a range of luminance from pitch dark to sunlight, which spans about 12 orders of magnitude. This is much more than a standard monitor or projector can display. This is why we coined for those devices a term “low dynamic range”. When we talk about high dynamic range, we usually mean a technology, that can cover almost complete range of luminance the human eye can see.

High vs Low Dynamic Range Video
LDR Video Intended for existing displays Relative pixel brightness HDR Video Intended for the human eye Photometric or radiometric units [cd/m2, Watt/m2sr] How is HDR video different from ordinary LDR video, for instance MPEG-4. The major difference comes from different design goals: MPEG-4 video was meant to be displayed on existing display devices, and therefore is limited by the precision of those. HDR video, on the other hand, tries to capture physical world with the accuracy that is limited by the capabilities of the human eye. Therefore, HDR video stores absolute photometric or radiometric values, which have physical meaning and MPEG-4 encodes pixel values, which can represent relative brightness.

High Dynamic Range Video
Goal: Efficient encoding of full dynamic range of luminance perceived by the human observer The goal to achieve: An efficient encoding of full dynamic range of luminance perceived by the human observer 1st demo

Overview HDR Pipeline HDR Video Encoding Results Demo & Applications
Luminance Quantization Edge Coding Results vs. MPEG-4 vs. OpenEXR Demo & Applications *give an overview of the exiting HDR technology in the context of HDR pipeline *present some details of our HDR video compression *than show result of a benchmark of our compression against MPEG-4 and OpenEXR *next we will give a demonstration of our compression and show a few possible applications

Acquisition  Storage  Display
Related Work HDR Pipeline Acquisition  Storage  Display related work in the context of a complete HDR pipeline: from acquisition to display

Related Work HDR Pipeline Acquisition  Storage  Display Global Illumination HDR Cameras HDRC (IMS Chips) Lars III (Silicon Vision) Autobrite (SMal Camera Technologies) LM9628 (National) Digital Pixel System (Pixim) Technology overview [Nayar2003] Video is acquired or generated in the first stage of the HDR pipeline. The natural source of HDR sequencies is Global Illumination rendering, where acurate physical lighting information is computed. To acquire video of a real scene, we can use a HDR camera, There are at least several models of such cameras already available on the market. HDRC – IMS Chips

Related Work HDR Pipeline Acquisition  Storage  Display Still images Radiance – RGBE [Ward91] OpenEXR [Bogart2003] logLuv TIFF [Ward98] HDR JPEG [Ward2004] Video No video format Naïve way – store 3x floating points – produce too much data To cope with that several formats for still images have been proposed the most recent is an HDR extension of an ordinary JPEG format was presented at the APGV conference. But so far no HDR video compression was proposed, and this is the subject of our paper. Why shouldn’t we choose one of still image format and compress separate frames: Video compression is much more effcient.

Related Work HDR Pipeline Acquisition  Storage  Display LDR Displays But Tone Mapping necessary HDR displays start to appear University of British Columbia [Seetzen2004] If we would like to show HDR images on low dynamic range display, like LCD or CRT, we need to apply tone mapping.

HDR Encoding Framework
Detail level 1: Input & Output LDR bitstream Video encoder HDR overview of the complete encoding framework, and in particular, how does it differs from MPEG-4. The first basic difference is the input: * MPEG-4 – 8-bit RGB low dynamic range data * HDR Video – floating points in absolute XYZ color space White: MPEG Orange: HDR Encoder

Detail level 2: Color Transform LDR Color YCrCb Video bitstream Transform Encoder HDR L u'v' p the lower level first – color transformation result – different color space (more suitable for encoding) major difference – Lpuv instead of YCrCb Lpuv can represent HDR data, effective for encoding White: MPEG Orange: HDR Encoder

Detail level 3: Edge Coding DCT Variable LDR Coding length bitstream Color Motion Tran. Comp. HDR Edge Run- Coding length further down into details a few more MPEG processing blocks – motion compensation, DCT coding and Variable-Length coding our second major extension: another pipeline for encoding sharp edges White: MPEG Orange: HDR Encoder

Encoding of Color Color Tran. Comp. Motion Coding DCT Run- length Edge
LDR RGB HDR XYZ bitstream Variable discuss our approach to the color transform

Encoding of Color How to represent color data?
Floating Points – ineffective compression Integers – ok, but require quantization How to quantize color data? Quantization errors < threshold of perception Use uniform color space (L*u*v*, L*a*b*) [Ward98] Find minimum number of bits Color (u*v*) – 8 bits are enough Naïve approach – floating points – ineffective better approach - use integers instead of floating points * a continuous variable, like luminance, must be quantized before it can be represented as an integer. * how to quantize color data in the best way, that is in the way that quantization error are not visible. * make sure that quantization error is below threshold of human perception. * to ensure that we can use a uniform color space, like Luv or Lab, and quantize uniformly using the proper amount of bits 8 bits turn out to be sufficient for color information. L component is meant for LDR devices and cannot be used.

Encoding of Luminance How to quantize luminance? Gamma correction?
Logarithm? 8 6 log(Y)? 4 log Luminance Y 2 the color can be represented using existing color spaces, we cannot do the same with luminance Those color spaces were not mean for HDR data. How to find the proper mapping for L? LDR – gamma correction – works good human brightness perception can not be approximated with a single power function for the full range of luminance. logarithm – eye response is not logarithmic -2 -4 Integer representation

Threshold Versus Intensity
Psychophysical measurements The smallest perceivable difference Y for a certain adaptation level YA tvi [Ferwerda96, CIE 12/2.1] Y log Threshold Y Instead of approximating human perception with a single algebraic function, we would like to derive an optimal quantization from the actual psychophysical measurements. One of characteristics of the human visual perception is a threshold versus intensity function. Threshold versus intensity function, drawn on the right, tells what is the smallest difference of luminance that is visible at a certain adaptation level. YA - Adaptation Luminance log Adaptation Luminance YA

Luminance Quantization
Just below threshold of perception Maximum quantization error f Y tvi ) ( max e log Luminance Y Assumption: quantization error (e_max) is below threshold of human perception (tvi(y)/f), where f >= 1 Integers Lp

Luminance Quantization
Just below threshold of perception Maximum quantization error )) ( 2 ) 1 l tvi f dl d y × = - f Y tvi ) ( max e decrease threshold mapping to ) ( space in - f Y L l P y log Luminance Y From this assumption we can derive a differential equation. A numerical solution of this equation gives the optimal compander function that can be used for quantization. We give a detailed derivation of this formula in the paper. 10 – 11 bits are enough Capacity function [Ashkihmin02] Grayscale Standard Display Function [DICOM03] Integers Lp

Luminance Quantizations Comparison
2 cvi 11-bit percep. quant. 32-bit LogLuv RGBE log Contrast Threshold -2 The X-axis spans the full dynamic range of luminance. The Y-axis denotes quantization error. The green line is a threshold of human perception. Every distortion that is below that line should not be visible. RGBE is a common exponent encoding used in Radiance format. LogLuv is logarithmic encoding used in TIFF format And finally orange line shows our perceptual quantization. All three encodings do not cause any visible distortions – they are well below the green line. The difference is that the perceptual mapping is also aligned to the threshold of human perception. The other encodings allocate too many bits to encode low-level illumination. -4 -4 -2 2 4 6 8 log Adapting Luminance

Edge Coding Color Tran. Comp. Motion Coding DCT Run- length Edge
LDR RGB HDR XYZ bitstream Variable Our second major modifications involved adding an additional pipeline for encoding a sharp contrast edges.

Edge Coding: Motivation
HDR video can contain sharp contrast edges Light sources, shadows DCT coding of sharp contrast may cause high frequency artifacts DCT coding Edge coding But why is such extension needed? HDR video can contain a sharp edges of much larger contrast than in case of the LDR video. In case of such sharp edges, the quantization of the DCT coefficients may results in high-frequency artifacts, like those seen in the middle image. When we employ the proposed edge coding, we practically eliminate those artifacts. However this is done at the cost of slightly higher bit-rate.

Edge Coding: Solution Solution: Encode sharp edges in spatial domain, the rest in frequency domain Run-length encoding The general idea of edge coding is shown on this slide. Discrete Cosine Transform is in general not effective for encoding a single sharp edges: they result in high values of all frequency coefficients. To avoid such ineffective coding, we want to split the original signal into two signals: one that contains only sharp edges and the other one that contains smooth values. The smooth signal does not contain high frequencies and can be efficiently encoded using Discrete Cosine Transform. The sharp edge signal contains only sparse values, so it can be run-length encoded. DCT encoding

Edge Coding: Algorithm
original I horizontal decomposition edge block horiz. edges II horizontal DCT III vertical decomposition The decomposition of signals shown on the previous slide is quite easy in 1D case, but it is not so trivial if we have to handle 2D data, like video frames. However the extension to 2D turns out to be quite neat, if we combine signal decomposition with 2D Discrete Cosine Transform. We perform 2D decomposition in four steps: Firstly, for each 8-by-8 pixels block we remove sharp edges from rows and place tham in the edge block. Note that the edge signal requires one column less data than the original block, because it contains only contrast differences. edge block vert. edges IV vertical DCT

original I horizontal decomposition edge block horiz. edges II horizontal DCT III vertical decomposition see: how the signal of a single row changes after removing sharp edges. edge block vert. edges IV vertical DCT

original I horizontal decomposition edge block horiz. edges II horizontal DCT III vertical decomposition Secondly, we perform 1D Discrete Cosine Transform on smoothed rows. Important observation: Only the DC coefficients – the first column - can contain sharp edges. This is because the horizontal signal has been smoothed out in the previous steps and their high frequency coefficients are usually low. edge block vert. edges IV vertical DCT

original I horizontal decomposition edge block horiz. edges II horizontal DCT III vertical decomposition This is why in the third step we can decompose only the first column and place the edge information of that column in the blank space of the edge block. edge block vert. edges IV vertical DCT

original I horizontal decomposition edge block horiz. edges II horizontal DCT III vertical decomposition Finally, we perform the vertical Discrete Cosine Transformation to finalize our hybrid coding. edge block vert. edges IV vertical DCT

Results 2x size of tone-mapped MPEG-4 video
20-30x saving compared to intra-frame compression (OpenEXR) We compared our HDR Video encoding with a low dynamic range MPEG-4 video compression and with an intra-frame HDR format – OpenEXR. MPEG-4 is LDR compression so we have to compare tone mapped video: for MPEG-4 before compression for HDR video after compression We used an Universal Image Quality Index (Wang and Bovik) metric to make sure that the quality of the both tone mapped sequences was the same. As can be seen on the chart, the MPEG-4 stream was approximately half of the size of the HDR sequence. Note however that MPEG-4 sequence obviously contained only partial information compared to HDR sequence. When we compared our encoding to a frame-by-frame compression, we got between 20 and 30 times savings. Of course this comparison is not fair, since OpenEXR is pure intra-frame compression and there are no mechanism for motion compensation. But this still gives an order of magnitude of a possible savings. Bit-stream Size

Demo & Applications Display dependent rendering Choice of tone-mapping
Extended postprocessing To demonstrate capabilities of our encoding on the example of the HDR Video player.

Conclusions HDR video compression Applications
Modest changes to MPEG-4 Lpu’v’ color space Luminance quantization (10-11 bits) Edge coding Applications On-the-fly tone mapping Blooming, motion blur, night vision Tuned for display LDR / HDR Display In this talk, we presented details on our HDR video compression: *We showed that only moderate modification to MPEG-4 are required to encode HDR data; *We derived a new color space for efficient representation of HDR data; *We also improved compression of high-contrast edges In the demo we showed several potential applications of HDR video: * like on-the fly tone mapping (adapted for dd), or * real time post-processing: physically accurate blooming, motion blur, and nigh vision We also proposed video that is tuned for particular monitor or TV set and light conditions

Acknowledgments HDR Images and Sequences Comments and help HDR Camera
Paul Debevec SpheronVR Jozef Zajac Christian Fuchs Patrick Reuter HDR Camera HDRC(R) VGAx courtesy of IMS CHIPS Comments and help Volker Blanz Scott Daly Michael Goesele Jeffrey Schoner

Thank you

Perception-motivated High Dynamic Range Video Encoding

Similar presentations

Presentation on theme: "Perception-motivated High Dynamic Range Video Encoding"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Perception-motivated High Dynamic Range Video Encoding

Similar presentations

Presentation on theme: "Perception-motivated High Dynamic Range Video Encoding"— Presentation transcript:

Similar presentations

About project

Feedback