Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Quality Video View Interpolation

Similar presentations


Presentation on theme: "High-Quality Video View Interpolation"— Presentation transcript:

1 High-Quality Video View Interpolation
Larry Zitnick Interactive Visual Media Group Microsoft Research

2 3D video Image centric Geometry centric
Figure from Kang/Szeliski/Anandan ICIP paper. We will be developing an imaging model that captures this spectrum and permits easy use of all these techniques. The important thing to understand is that finding a common platform to accommodate this entire spectrum gains us the flexibility to make use of each technique represented in the spectrum and the efficiency to mix the representations without a performance penalty. Fixed geometry View-dependent texture View-dependent geometry Sprites with depth Layered depth Image Lumigraph Light field Polygon rendering + texture mapping Warping Interpolation

3 Current practice free viewpoint video Many cameras vs. Motion Jitter

4 Current practice free viewpoint video Many cameras vs. Motion Jitter

5 Video view interpolation
Fewer cameras and Smooth Motion Automatic Real-time rendering

6 System overview Video Capture Video Capture Stereo Representation
OFFLINE Stereo Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

7 cameras cameras hard disks controlling laptop concentrators
Our video capture system consists of 8 cameras, each with a resolution of 1024 by 768 capturing at 15 frames per second. Each group of 4 cameras is synchronized using a device called a concentrator, which pipes all the uncompressed video data to a bank of hard disks via a fiber optic cable. The two concentrators are themselves synchronized, and are controlled by a single laptop.

8 Calibration Zhengyou Zhang, 2000

9 Input videos

10 System overview Video Capture Video Capture Stereo Stereo
OFFLINE Stereo Stereo Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

11 Key to view interpolation: Geometry
Stereo Geometry Image 1 Image 2 Camera 1 Camera 2 Virtual Camera

12 Image correspondence Image 1 Image 2 Leg Correct Match Score Incorrect
Good Bad Incorrect Match Score Match Score Match Score Wall

13 Why segments? Better delineation of boundaries.

14 Why segments? Larger support for matching.
Handle gain and offset differences without global model (Kim, Kolmogorov and Zabih, 2003.)

15 Why segments? More efficient. 786,432 pixels vs. 1000 segments
Compute disparities per segment rather than per pixel.

16 Segmentation Many methods will work:
Graph-based (Felzenszwalb and Huttenlocher, 2004) Mean Shift (Comaniciu, et al. 2001) Min-cut (Boykov et al. 2001) Others…

17 Segmentation: Important properties
Not too large, not too small… As large as possible while not spanning multiple objects.

18 Segmentation: Important properties
Stable Regions

19 Segmentation: Our Approach
First average… …then segment. Anisotropic smoothing

20 Segmentation: Result Close-up

21 Matching segments Many measures will work:
SSD Normalized correlation Mutual information Depends on color balancing and image quality.

22 Matching segments: Important properties
Never remove correct matches. Remove as many false matches as possible Use global methods to remove remaining false positives.

23 Matching segments: Our approach
Create gain histogram 0.8 1.25 Good match 0.8 1.25 Bad match

24 Local matching Image 1 Image 2 Low texture

25 Global regularization
Create MRF (Markov Random Field): Image 1 Image 2 A F E D C B R P Q S T U Number of states = number of depth levels Each segment is a node

26 Global regularization
Likelihood (data term) Prior (regularization term) Disparity Images

27 Global regularization
Image 1 Image 2 A F E D C B R P Q S T U colorA ≈ colorB → zA ≈ zB

28 Global regularization
Variance – % of border and similarity of color Normal distribution A F E D C B Disparity

29 Multiple disparity maps
Compute a disparity map for each image. We want the disparity maps to be consistent across images…

30 Consistent disparities
Image 1 Image 2 A F E D C B R P Q A S T U zA ≈ zP, zQ, zS

31 Consistent disparities
Disparities dependent on neighboring disparities. Likelihood includes neighboring disparities.

32 Consistent disparities
Use original data term if not occluded. Bias disparities to lie behind known surfaces when occluded. if not occluded if occluded

33 Is the segment occluded?
Ii Occluded Not occluded

34 If occluded… Disparity Ii Occluded

35 Iteratively solve MRF

36 Depth through time

37 Matting Interpolated view without matting Bayesian Matting
Background Surface Interpolated view without matting Foreground Surface Background Background Alpha Strip Width Foreground Foreground Bayesian Matting Chuang et al. 2001 Camera

38 Rendering with matting
No Matting Matting

39 System overview Video Capture Stereo Stereo Representation
OFFLINE Stereo Stereo Representation Representation Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

40 Representation Main Boundary Strip Width Boundary Layer: Main Layer:
Background Boundary Strip Width Foreground Boundary Layer: Color Depth Alpha Main Layer: Color Depth

41 System overview Video Capture Stereo Representation Representation
OFFLINE Stereo Representation Representation Compression Compression File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Render

42 Compression Camera 1 Camera 2 Camera 3 Camera 4 Time = 0 Time = 1
Compression is used to reduce the large data-set to a manageable size and to allow fast playback from disk. We developed our own codec to make use of both temporal and between-camera redundancy. Temporal prediction is used to compress the reference camera’s data in terms of previously decoded results from an earlier frame time. Spatial prediction makes use of the reference camera’s disparity map to transform its texture and disparity data into the viewpoint of a spatially adjacent camera. The differences between predicted and actual images are coded using a novel transform-based compression scheme which can simultaneously handle texture, disparity and alpha-map data. To obtain real-time interactivity, the overall decoding scheme is highly optimized for speed and makes use of the GPU where possible. Camera 4 Time = 0 Time = 1

43 Compression Camera 1 Temporal Prediction Camera 2 Camera 3 Camera 4
Compression is used to reduce the large data-set to a manageable size and to allow fast playback from disk. We developed our own codec to make use of both temporal and between-camera redundancy. Temporal prediction is used to compress the reference camera’s data in terms of previously decoded results from an earlier frame time. Spatial prediction makes use of the reference camera’s disparity map to transform its texture and disparity data into the viewpoint of a spatially adjacent camera. The differences between predicted and actual images are coded using a novel transform-based compression scheme which can simultaneously handle texture, disparity and alpha-map data. To obtain real-time interactivity, the overall decoding scheme is highly optimized for speed and makes use of the GPU where possible. Camera 4 Time = 0 Time = 1

44 Compression Camera 1 Camera 2 Camera 3 Spatial Prediction Camera 4
Compression is used to reduce the large data-set to a manageable size and to allow fast playback from disk. We developed our own codec to make use of both temporal and between-camera redundancy. Temporal prediction is used to compress the reference camera’s data in terms of previously decoded results from an earlier frame time. Spatial prediction makes use of the reference camera’s disparity map to transform its texture and disparity data into the viewpoint of a spatially adjacent camera. The differences between predicted and actual images are coded using a novel transform-based compression scheme which can simultaneously handle texture, disparity and alpha-map data. To obtain real-time interactivity, the overall decoding scheme is highly optimized for speed and makes use of the GPU where possible. Camera 4 Time = 0 Time = 1

45 Spatial prediction Depth and Texture Reference Camera Predicted Camera

46 Spatial prediction Depth and Texture Warped Reference Camera Predicted

47 Spatial prediction Warped Depth and Texture Error Signal _ + Reference
Camera Error Signal _ Predicted Camera +

48 Spatial prediction Warped Depth and Texture
Reference Camera Reconstructed (after error signal is added) Predicted Camera

49

50 Boundary layer coding Depth Color Texture Alpha Matte
Use our own shape coding method similar to MPEG-4

51 System overview Video Capture Stereo Representation Compression
OFFLINE Stereo Representation Compression Compression File File Our rendering system consists of offline and online components. The video capture and processing are done offline. Video processing consists of stereo computation and data compression. The dynamic scene can then be interactively viewed by selectively decompressing the data file and rendering it. ONLINE Selective Decompression Selective Decompression Render Render

52 Rendering Source Cameras
Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

53 Rendering Composite Project Boundary Layer Project Boundary Layer
Main Layer Project Main Layer Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

54 Rendering the main layer (Step 1)
Depth Color Video of background depth Video of background color Projected Color Buffer At every frame time the background depth map is used to create a dense mesh which is texture mapped with the background color map. The mesh is projected into the virtual cameras view. A pixel shader program rejects any pixels that are at large depth gradients. Vertex Shader Pixel Shader Position, Texture Coord GPU Z-Buffer

55

56

57 Rendering the main layer (Step 2)
Depth Projected Color Buffer Locate Depth Discontinuities Need to remove main layer triangles connecting background to foreground (1 pixel wide). Avoid modifying the mesh on a frame by frame basis. Set Z-buffer to far away and colors to transparent for boundary. Generate Erase Mesh Pixel Shader CPU GPU Z-Buffer

58 Rendering boundary layer
Depth Boundary RGBA Projected Main Layer Projected Color Buffer Vertex Colors At every frame time the background depth map is used to create a dense mesh which is texture mapped with the background color map. The mesh is projected into the virtual cameras view. A pixel shader program rejects any pixels that are at large depth gradients. Generate Boundary Mesh Compositing CPU GPU Z-Buffer

59 Graphics for Vision Use the GPU for vision.
Real-time stereo – (Yang and Pollefeys, CVPR 03)

60 Rendering Composite Project Boundary Layer Project Boundary Layer
Main Layer Project Main Layer Next we describe the process of using the GPU to render a novel viewpoint from the compressed data. Given a novel viewpoint <highlight virtual camera> the rendering program determines the two nearest cameras <highlight two nearest>. The data from these two cameras is blended to create the new viewpoint. A block diagram of the rendering process is shown here <pause>.

61 Compositing views Pixel Shader Camera 1 Camera 2 Final composite
Weights based on proximity to virtual viewpoint Final composite Final Result Normalization Pixel Shader GPU

62 DEMO Running in real time on a xxx machine. Pause, interpolate, with without playback. Decompressed and rendered in real time. 640x480 x N frames = 300MB.

63 “Massive Arabesque” videoclip

64 Future work Mesh simplification More complicated scenes
Temporal interpolation (use optical flow) Wider range of virtual motion 2D grid of cameras

65 Summary Sparse camera configuration High-quality depth recovery
Automatic matting New two-layer representation Inter-camera compression Real-time rendering


Download ppt "High-Quality Video View Interpolation"

Similar presentations


Ads by Google