Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNSW – EE&T Scalable Media Coding Applications and New Directions David Taubman School of Electrical Engineering & Telecommunications UNSW Australia.

Similar presentations


Presentation on theme: "UNSW – EE&T Scalable Media Coding Applications and New Directions David Taubman School of Electrical Engineering & Telecommunications UNSW Australia."— Presentation transcript:

1 UNSW – EE&T Scalable Media Coding Applications and New Directions David Taubman School of Electrical Engineering & Telecommunications UNSW Australia

2 UNSW – EE&T Taubman; PCS’13, San Jose1 Overview of Talk Backdrop and Perspective –interactive remote browsing of media –JPEG2000, JPIP and emerging applications Approaches to scalable video –fully embedded approaches based on wavelet lifting –partially embedded approaches based on inter-layer prediction –important things to appreciate Key challenges for scalable video and other media –focus on motion, depth and natural models Early work on motion for scalable coding –focus: reduce artificial boundaries Current focus: everything is imagery –depth, motion, boundary geometry –scalable compression of breakpoints (sparse innovations) –estimation algorithms for joint discovery of depth, motion and geometry –temporal flows based on breakpoints Summary

3 UNSW – EE&T Emerging Trends Video formats –QCIF (25 Kpel), CIF (100 Kpel), 4CIF/SDTV (½ Mpel), HDTV (2 Mpel) UHDTV 4K (10 Mpel) and 8K (32 Mpel) –Cinema: 24/48/60 fps; UHDTV: potentially up to 120 fps –HFR: helmet cams & professional cameras now do 240fps Displays –“retina” resolutions (200 to 400 pixels/inch) –what resolution video do I need for an iPad? (2048x1536?) Internet and mobile devices –vast majority of internet traffic is video –100’s of separate non-scalable encodings of a video New media: multi-view video, depth, … Taubman; PCS’13, San Jose2

4 UNSW – EE&T Embedded media One code-stream, many subsets of interest 3Taubman; PCS’13, San Jose Low res Medium res High res Compressed bit-stream Low quality Medium quality High quality Low frame rate Medium frame rate High frame rate Heterogeneous clients (classic application) –match client bandwidth, display, computation, …. Graceful degradation (another classic) –more important subsets protected more heavily, … Robust offloading of sensitive media –backup most important elements first

5 UNSW – EE&T Our focus: Interactive Browsing Image quality progressively improves over time Video quality improves each time we go back Region (window) of interest accessibility –in space, in time, across views, … –can yield enormous effective compression gains –prioritized streaming based on “degree of relevance” some elements contribute only partially to the window of interest Related application: Early retrieval of surveillance –can’t send everything over the air –can’t wait until the plane lands Taubman: PCS’13, San Jose4

6 UNSW – EE&T Scalable images – things that work well Multi-resolution transforms –2D wavelet transforms work well 5 Accessibility through partitioned coding of subbands –Region of interest access without any blocking artefacts ECDZQ R-D curve (step size modulation) Bit-plane coding (truncation) 2 coding passes per bit-plane Embedded coding –Successive refinement through bit-plane coding –Multiple coding passes/bit-plane improve embedding Taubman; PCS’13, San Jose

7 UNSW – EE&T 6 JPEG2000 – more than compression Decoupling and embedding embedded code-block bit-streams embedded code-block bit-streams LL 2 LH 2 HH 2 HL 2 HL 1 HH 1 LH 1 Taubman; PCS’13, San Jose

8 UNSW – EE&T 7 JPEG2000 – more than compression Spatial random access Taubman; PCS’13, San Jose

9 UNSW – EE&T 8 JPEG2000 – more than compression Quality and resolution scalability LL 2 LH 2 HH 2 HL 2 HL 1 HH 1 LH 1 layer 1 layer 2 layer 3 quality layers Taubman; PCS’13, San Jose

10 UNSW – EE&T 9 JPEG2000 – JPIP interactivity (IS ) Client sends “window requests” –spatial region, resolution, components, … Server sends “JPIP stream” messages –self-describing, arbitrarily ordered –pre-emptable, server optimized data stream Server typically models client cache –avoids redundant transmission JPIP also does metadata –scalability & accessibility for text, regions, XML, … Cache Model imagery window request JPIP ServerJPIP Client Target (file or code-stream) Decompress/render Application JPIP stream + response headers Client Cache window status Taubman; PCS’13, San Jose

11 UNSW – EE&T 10 What can you do with JPIP? Highly efficient interactive navigation within –large images (giga-pixel, even tera-pixel) –medical volumes –virtual microscopy –window of interest access, progressive to lossless –interactive metadata Interactive video –frame of interest –region of interest –frame rate and resolution of interest –quality improves each time we go back over content Panoramic Video Demo Aerial Demo Album Demo Catscan Demo Campus Demo Taubman; PCS’13, San Jose

12 UNSW – EE&T Taubman; PCS’13, San Jose11 Overview of Talk Backdrop and Perspective –interactive remote browsing of media –JPEG2000, JPIP and emerging applications Approaches to scalable video –fully embedded approaches based on wavelet lifting –partially embedded approaches based on inter-layer prediction –important things to appreciate Key challenges for scalable video and other media –focus on motion, depth and natural models Early work on motion for scalable coding –focus: reduce artificial boundaries Current focus: everything is imagery –depth, motion, boundary geometry –scalable compression of breakpoints (sparse innovations) –estimation algorithms for joint discovery of depth, motion and geometry –temporal flows based on breakpoints Summary

13 UNSW – EE&T Wavelet-like approaches Lifting factorization exists for any FIR temporal wavelet transform Warp frames prior to lifting filters,  j –motion compensation aligns frame features –invertibility unaffected (any motion model) * 1 * 2 * L-1 * L Even frames Odd frames Low-pass frames High-pass frames warp frames Motion Compensated Temporal Lifting Taubman; PCS’13, San Jose12

14 UNSW – EE&T Motion Compensated Temporal Lifting Equiv. to filtering along motion trajectories –subject to good MC interpolation filters –subject to existence of 2D motion trajectories Handles expansive/contractive motion flows –motion trajectory filtering interpretation still valid for all but highest spatial freqs –subject to disciplined warping to minimize aliasing effects Need good motion! –smooth, bijective mappings wherever possible –avoid unnecessary motion discontinuities (e.g., blocks) Taubman; PCS’13, San Jose13

15 UNSW – EE&T t+2D Structure with MC Lifting MC Lifting Spatial DWT MC Lifting t+2D Analysis t+2D Synthesis H1H1 H2H2 L2L2 H2H2 L2L2 H1H1 Spatio-Temporal Subbands input frames Spatial resolution Quality Temporal resolution Dimensions of Scalability: NB: other interesting structures exist Taubman; PCS’13, San Jose14

16 UNSW – EE&T Inter-layer prediction (SVC/SHEVC) Taubman: PCS’13, San Jose High Resolution Layer Low Resolution Layer Multiple predictors pick one? blend? Implications for embedding …

17 UNSW – EE&T Important Principles Scalability is a multi-dimensional phenomenon –not just a sequential list of layers Quality scalability critical for embedded schemes –low res subset needs much higher quality at high res Physical motion scales naturally –not generally true for block-based approximations –but, 2D motion discontinuous at boundaries Prediction alone is sub-optimal –full spatio-temporal transforms preferred improve the quality of embedded resolutions and frame rates noise orthogonalisation –ideally orthogonal transforms note body of work by Flierl et al. (MMSP’06 thru PCS’13) Taubman: PCS’13, San Jose16

18 UNSW – EE&T Temporal transforms: Why prediction alone is sub-optimal even frames odd frames residual forward transform reverse transform quantization 1 -½ 1 1 ½ ½ 1 Redundant spanning of low-pass content by both channels  High-pass quantization noise has unnecessarily high energy gain. Bi-directional prediction Taubman: PCS’13, San Jose

19 UNSW – EE&T 18 Reduced noise power through lifting Inject –ve fraction of high band into low band synthesis path –removes low freq. noise power from synthesized high band Add compensating step in the forward transform –does not affect energy compacting properties of prediction even frames odd frames Taubman: PCS’13, San Jose

20 UNSW – EE&T Taubman; PCS’13, San Jose19 Overview of Talk Backdrop and Perspective –interactive remote browsing of media –JPEG2000, JPIP and emerging applications Approaches to scalable video –fully embedded approaches based on wavelet lifting –partially embedded approaches based on inter-layer prediction –important things to appreciate Key challenges for scalable video and other media –focus on motion, depth and natural models Early work on motion for scalable coding –focus: reduce artificial boundaries Current focus: everything is imagery –depth, motion, boundary geometry –scalable compression of breakpoints (sparse innovations) –estimation algorithms for joint discovery of depth, motion and geometry –temporal flows based on breakpoints Summary

21 UNSW – EE&T Motion for Scalable Video Fully scalable video requires scalable motion –reduce motion bit-rate as video quality reduces –reduce motion resolution as video resolution reduces First demonstration (Taubman & Secker, 2003) –16x16 triangular mesh motion model –Wavelet transform of mesh node vectors –EBCOT coding of mesh subbands –Model-based allocation of motion bits to quality layers –Pure t+2D motion-compensated temporal lifting 20Taubman; PCS’13, San Jose

22 UNSW – EE&T Scalable motion – very early results Luminance PSNR (dB) Bit-Rate (kbit/s) Non-scalable Brute-force Model-based Lossless motion Non-scalable Brute-force Model-based Lossless motion CIF BusCIF Bus at QCIF resolution Bit-Rate (kbit/s) CIF mobile Luminance PSNR (dB) Bit-Rate (Mbit/s) 5/3 temporal lifting 1/3 (hierarchical B-frames) H.264 high complexity H.264 results CABAC 5 prev, 3 future ref frames multi-hypothesis testing (courtesy of Marcus Flierl) Taubman; PCS’13, San Jose

23 UNSW – EE&T Motion challenges Issues: –smooth motion fields scale well mesh is guaranteed to be smooth and invertible everywhere –but, real motion fields have discontinuities Hierarchical block-based schemes –produce a massive number of artificial discontinuities –not invertible – i.e., there are no motion trajectories –non-physical – hence, not easy to scale –but, easy to optimize for energy compaction particularly effective at lower bit-rates Depth/disparity has all the same issues as motion –both tend to be piecewise smooth media –NB: bandlimited sampling considerations may not apply boundary discontinuity modeling more important for these media 22Taubman; PCS’13, San Jose

24 UNSW – EE&T 23 Aliasing challenges Analysis filter responses of the popular 9/7 wavelet transform Fundamental constraint: (for perfect reconstruction) half-band filter Extract LL subband Spatial aliasing Taubman; PCS’13, San Jose

25 UNSW – EE&T Spatial scalability – t+2D Temporal transform uses full spatial resolution 24 temporal predict temporal update At reduced spatial resolution –Temporal synthesis steps missing high-resolution info –If motion trajectories wrong/non-physical  ghosting –If trajectories valid  temporal synthesis reduces aliasing less aliasing than regular spatial DWT at reduced resolution Taubman; PCS’13, San Jose

26 UNSW – EE&T Taubman; PCS’13, San Jose25 Overview of Talk Backdrop and Perspective –interactive remote browsing of media –JPEG2000, JPIP and emerging applications Approaches to scalable video –fully embedded approaches based on wavelet lifting –partially embedded approaches based on inter-layer prediction –important things to appreciate Key challenges for scalable video and other media –focus on motion, depth and natural models Early work on motion for scalable coding –focus: reduce artificial boundaries Current focus: everything is imagery –depth, motion, boundary geometry –scalable compression of breakpoints (sparse innovations) –estimation algorithms for joint discovery of depth, motion and geometry –high level view of the coding framework we are pursuing Summary

27 UNSW – EE&T Block-based schemes with merging Linear & affine models –encourages larger blocks Merging of quad-tree nodes –encourages larger regions and improves efficiency –merging approach later picked up by the HEVC standard Hierarchical coding –works very well with merging; provides resolution scalability 26 leaf merging (Mathew and Taubman, 2006) Taubman; PCS’13, San Jose

28 UNSW – EE&T Boundary geometry and merging Model motion & boundary No merging –Hung et al. (2006) –Escoda et al. (2007) With merging –Mathew & Taubman (2007) –separate quad-trees (2008) 27 M2M2 M1M1 Motion Comp (M 2 ) Motion Comp (M 1 ) Combined result Taubman; PCS’13, San Jose Motion onlyMotion + boundarySeparate quad-trees

29 UNSW – EE&T Indicative Performance Things that reduce artificial discontinuities: –modeling geometry as well as motion –separately pruned trees for geometry and motion –merging nodes from the pruned quad-trees These schemes are practical and resolution scalable –readily optimized across the hierarchy 28 Two quad-trees: motion, geometry, merging Single quad-tree: motion, geometry, merging Single quad-tree: motion only Single quad-tree: motion + merging Taubman; PCS’13, San Jose

30 UNSW – EE&T Taubman; PCS’13, San Jose29 Overview of Talk Backdrop and Perspective –interactive remote browsing of media –JPEG2000, JPIP and emerging applications Approaches to scalable video –fully embedded approaches based on wavelet lifting –partially embedded approaches based on inter-layer prediction –important things to appreciate Key challenges for scalable video and other media –focus on motion, depth and natural models Early work on motion for scalable coding –focus: reduce artificial boundaries Current focus: everything is imagery –depth, motion, boundary geometry –scalable compression of breakpoints (sparse innovations) –estimation algorithms for joint discovery of depth, motion and geometry –temporal flows induced by breakpoints Summary

31 UNSW – EE&T Geometry from arc breakpoints Breaks can represent segmentation contours –but don’t need a segmentation –breaks are all we need for adaptive transforms –direct RD optimisation is possible Natural resolution scalability –finer resolution arcs embedded in coarser arcs Taubman: PCS’13, San Jose30 Coarse gridFiner grid Arcs Gridpoints

32 UNSW – EE&T Finer grid Breakpoint induction and scalability Explicitly identify subset of breakpoints – “vertices” –remaining breakpoints get induced –position of break on its arc impacts induction Representation is fully scalable –resolution improves as we add vertices at finer scales –quality improves as we add precision to breakpoint positions Breakpoint representation is image-like –vertex density closely related to image resolution –vertex accuracy like sample accuracy in an image hierarchy Taubman: PCS’13, San Jose31 Coarse grid direct induction onto sub-arcs spatial induction onto root-arcs

33 UNSW – EE&T 32 Breakpoints drive an adaptive DWT Basis functions do not cross discontinuity along an arc Max of one breakpoint per arc Adaptive transform well defined Arc Breakpoint Pyramid Field Sample Pyramid Original field samples Breakpoint Adaptive Transforms (BPA-DWT) – sequence of non-separable 2D lifting steps P1-step U1-step P2-step U2-step Taubman: PCS’13, San Jose

34 UNSW – EE&T root arcs Arc-bands and sub-bands 33 Sub-bands (BPA-transformed data) Arc-bands (vertices) Level 2 Level 1 Level 0 codeblocks non-root arcs Taubman: PCS’13, San Jose

35 UNSW – EE&T Embedded Block Coding – for scalability and ROI accessibility Sub-band stream (field samples) –Sub-bands divided into code blocks –Coded using EBCOT (JPEG2000) –Bitplanes assigned to quality layers Vertex Stream –Arc-bands divided into code blocks –Coding scheme similar to EBCOT –Bitplanes refine vertex locations –Bitplanes assigned to quality layers X X X X LSB λ1λ1 λ2λ2 λ3λ3 Vertex stream Sub-band stream Taubman: PCS’13, San Jose34

36 UNSW – EE&T Fully Scalable Depth Coding – field samples are depth; breaks are depth discontinuities 35 JPEG 2000, 50 k bits Resolution scalable Quality scalable No blocks Proposed, 50 k bits Resolution scalable Quality scalable No blocks Poorly suited to discontinuities in depth/motion fields Well suited to discontinuous depth/motion fields Taubman: PCS’13, San Jose

37 UNSW – EE&T R-D Results for Depth Coding Taubman: PCS’13, San Jose36 JPEG2000: 5 levels of decomposition with 5/3 DWT Breakpoint-adaptive: vertex stream is truncated at a quality level sub-band stream decoded progressively

38 UNSW – EE&T Model-based quality layering 37 Scaled by discarding sub-band and arc-band quality layers –fully automatic model-based quality layer formation –model-based interleaving of all quality layers for optimal embedding Taubman: PCS’13, San Jose

39 UNSW – EE&T Model-based quality layering 38 Compared with segmentation based approach (Zanuttigh & Cortelazzo, 2009) –not scalable; sensitive to initial choice of segmentation complexity Taubman: PCS’13, San Jose

40 UNSW – EE&T Fully scalable motion coding – preliminary Field samples are motion vectors –motion coded using EBCOT after BPA-DWT Breakpoints and motion jointly estimated –compression-regularised optical flow: –Coded length L provides the prior (regularisation) reflects cost of scalably coding arc breakpoints plus cost of coding motion wavelet coeffs after BPA-DWT –Distortion D provides the observation model Taubman: PCS’13, San Jose39

41 UNSW – EE&T Optimisation framework Approach based on loopy belief propagation Breakpoints/vertices connected via inducing rules –turns out to be very efficient complexity grows only linearly with image size Breakpoints/motion connected via BPA-DWT –initial simplification: motion at each pixel location drawn from a small finite set (cardinality 5) –initial motion candidates generated by block search with varying block sizes System always converges rapidly Use modes of the marginal beliefs at each node Taubman: PCS’13, San Jose40

42 UNSW – EE&T Initial scalable motion results 2 frames, inter-predict only; occlusions removed –newer work eliminates restriction to finite motion set Taubman: PCS’13, San Jose41 JPEG2000 H.264 trunc motion bits trunc vertex and motion bits

43 UNSW – EE&T Visualising the breaks & motion Taubman: PCS’13, San Jose42

44 UNSW – EE&T Temporal induction of motion – preliminary Given: –motion from f 1 to f 2 –plus breakpoint fields in f 1 and f 2 Induce: –motion from f 2 to f 1 –resolve ambiguities, find occlusions Taubman: PCS’13, San Jose43 f1f1 f2f2

45 UNSW – EE&T Synthetic example: Taubman: PCS’13, San Jose44 Inferred horizontal motion Inferred vertical motion Inferred occlusion Resolved ambiguities

46 UNSW – EE&T Temporal induction of breaks – preliminary Given: –motion from frame f 1 to frame f 3 –plus breakpoint fields in f 1 and f 3 –plus motion from f 1 to f 2 (real or estimated) Induce: –breakpoint field in frame f 2 –hence estimate all other motion fields –note: induced occlusions are temporal breaks Taubman: PCS’13, San Jose45 f1f1 f2f2 f3f3

47 UNSW – EE&T Synthetic example: Taubman: PCS’13, San Jose46 Horizontal breaks Frame 1 Horizontal breaks Frame 2 induced Horizontal breaks Frame 3

48 UNSW – EE&T Things we are working on Message approximation strategies for joint inference of breakpoint fields with motion –compression regularized optical flow –continuous motion, notionally at infinite precision Advanced breakpoint coding techniques –including differential coding schemes i.e., geometry transforms –including inter-block conditional coding schemes that do not break the EBCOT paradigm Breakpoint adaptive spatio-temporal transforms –breakpoints potentially address most of the open issues surrounding motion compensated lifting schemes Integration within our JPEG2000 SDK (Kakadu) –interactive access, JPIP for remote browsing, etc. Taubman: PCS’13, San Jose47

49 UNSW – EE&T Demo placeholder Interactive browsing of breakpoint media over JPIP Taubman: PCS’13, San Jose48

50 UNSW – EE&T Arc breakpoints vs graph edges arcs = possible edges on spatial graph –breaks = missing edges –main point of divergence arc breakpoints have locations (key to induction & scalability) graph edges may be weighted (equivalent to “soft” breaks) Related approaches: “Graph based transforms for depth video coding” Kim, Narang and Ortega, ICASSP’12 “Video coder based on lifting transforms on graphs” Martinez-Enriquez, Diaz-de-Maria & Ortega, ICIP’11 –Edges (binary breaks) from segmentation (JBIG encoded) –Motion (block based) produces temporal edges –Spatio-temporal video transform adapts to graph –hierarchical, but not scalable in the usual sense Taubman: PCS’13, San Jose49

51 UNSW – EE&T Summary Interactive image browsing benefits from –embedded transforms and embedded block coding –intelligent servers send only a fraction of total content –getting only what you want ~ huge effective coding gain Challenges for interactive video –fully embedded motion fields –bijective motion fields wherever this is physical Key: fully scalable model of motion discontinuities –allows motion to be invertible almost everywhere –ideal for motion compensated 3D wavelet xforms –allows motion to scale naturally in space and time Numerous research challenges … Taubman: PCS’13, San Jose50

52 UNSW – EE&T Important Acknowledgements Reji Mathew –scalable motion, scalable depth & breakpoint estimation Sean Young –scalable motion coding results Dominic Rüfenacht –temporal induction of motion and breakpoints Pietro Zanuttigh –breakpoint estimation for depth coding Australian Research Council –much of the presented research was partially funded by the ARC Discovery Projects scheme Taubman: PCS’13, San Jose51


Download ppt "UNSW – EE&T Scalable Media Coding Applications and New Directions David Taubman School of Electrical Engineering & Telecommunications UNSW Australia."

Similar presentations


Ads by Google