Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Building an Accurate Stereo Matching System on Graphics Hardware

Similar presentations


Presentation on theme: "On Building an Accurate Stereo Matching System on Graphics Hardware"— Presentation transcript:

1 On Building an Accurate Stereo Matching System on Graphics Hardware
Xing Mei ; Xun Sun ;  Mingcai Zhou ;  Shaohui Jiao ;   Haitao Wang ; Xiaopeng Zhang  Samsung Advanced Institute of Technology, China Lab Computer Vision Workshops, 2011 IEEE

2 Outline Introduction Related Works Algorithmn CUDA Implementation
Experimental Results Conclusion

3 Introduction

4 Introduction Dense two-frame stereo matching
Compute a disparity map from stereo images. Broad applications: 3D reconstruction, view interpolation

5 Related Works

6 Related Works Local methods Global methods
Compute each pixel’s disparity independently over a local support region. Fast but inaccurate. Global methods Solve the stereo problem in an energy minimization process. Accurate but slow due to time-comsuming global optimizer.(GC,BP)

7 Related Works Propagation-based methods
Produce quasi-dense or dense disparity results from a set of seed pixels. Relatively fast but sensitive to early wrong matches use segmented regions as guided propagation unit expensive cost

8 Related Works Introduce a simple guided unit for propagation : pixel-wise 1D line segments. No image segmentation required here. Simple, fast and accurate

9 Algorithmn

10 Algorithmn AD-Census Cost Initialization Framework
Multi-step Disparity Refinement Scanline Optimization Cross-based Cost Aggregation AD-Census Cost Initialization Input: Stereo images Output: Disparity map

11 Algorithmn AD-Census Cost Initialization Cross-based Cost Aggregation
Multi-step Disparity Refinement Scanline Optimization Cross-based Cost Aggregation AD-Census Cost Initialization Input: Stereo images Output: Disparity map

12 Disparity Cost Computing
Cost mesure : AD, BT, gradient-based measures, non-parametric transforms(rank/census[3])...... Combination : SAD + gradient[6] , AD + Census AD (Absolute Distance) Constant color assumption Repetitive structures Census Encodes local image structures Textureless regions [3] H. Hirschmuller and D. Scharstein. “Evaluation of stereo matching costs on images with radiometric differences.”IEEE TPAMI, 31(9):2009. [6] A. Klaus, M. Sormann, and K. Karner. “Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure.” ICPR,2006.

13 AD-Census Cost Initialization
𝐶 p,𝑑 = 𝜌( 𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 p,𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 )+𝜌( 𝐶 𝐴𝐷 p,𝑑 , 𝜆 𝐴𝐷 ) p : pixel d : level >> a robust function on variable 𝑐 pd = (x-d,y) in the right image 𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 : Hamming distance[22] d Left I Right I [22] R. Zabih and J. Woodfill. “Non-parametric local transforms for computing visual correspondence.” In Proc. ECCV, 1994.

14 Census Transform Census transform window : 121 130 26 31 39 109 115 33
40 30 98 102 78 67 45 47 32 170 198 86 99 159 210 1 X Census transform window : 1

15 Census Hamming Distance
Left image Right image Hamming Distance = 3 1 XOR 1 1

16 AD-Census Cost Initialization
𝐶 p,𝑑 = 𝜌( 𝐶 𝑐𝑒𝑛𝑠𝑢𝑠 p,𝑑 , 𝜆 𝑐𝑒𝑛𝑐𝑢𝑠 )+𝜌( 𝐶 𝐴𝐷 p,𝑑 , 𝜆 𝐴𝐷 ) > >> a robust function on variable 𝑐

17 AD-Census Cost Initialization
AD-Census measure produces proper disparity results for both repetitive structures and textureless regions.

18 Algorithmn AD-Census Cost Initialization Cross-based Cost Aggregation
Multi-step Disparity Refinement Scanline Optimization Cross-based Cost Aggregation AD-Census Cost Initialization Input: Stereo images Output: Disparity map

19 Cross-based Cost Aggregation[23]
Cross construction Line ending points P1, P2 for P are located when rule 1 or 2 are violated: R1: Color self-similarity in the line region: smooth depth assumption R2: Arm length limitation: avoid over-smoothness [23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

20 Cross-based Cost Aggregation

21 Cross-based Cost Aggregation
Enhance cross construction (use pixel p’s left arm and the endpoint pixel pl as an example)

22 Cross-based Cost Aggregation
Run this step for 4 iterations to get stable cost values. For iteration 1 and 3, aggregated horizontally and then vertically. For iteration 2 and 4, aggregated vertically and then horizontally. Reduce the errors at depth discontinuities.

23 Cross-based Cost Aggregation
Our aggregation method can better handle large textureless regions and depth discontinuities.

24 Cross-based Cost Aggregation
[21] K.-J. Yoon and I.-S. Kweon. “Adaptive support-weight approach for correspondence search.” IEEE TPAMI, 2006. [23] K. Zhang, J. Lu, and G. Lafruit. “Cross-based local stereo matching using orthogonal integral images.” IEEE TCSVT,2009.

25 Algorithmn AD-Census Cost Initialization Cross-based Cost Aggregation
Multi-step Disparity Refinement Scanline Optimization Cross-based Cost Aggregation AD-Census Cost Initialization Input: Stereo images Output: Disparity map

26 Scanline Optimization[2]
4 scanline optimization processes are performed independently. 2 horizontal directions 2 vertical directions 𝐶 2 𝐶 𝑟 [2] H. Hirschmuller. Stereo processing by semiglobal matching and mutual information.” IEEE TPAMI, 2008.

27 Scanline Optimization
p-r p r r : direction p-r : the previous pixel along the same direction 𝑃1, 𝑃2 : penalize the disparity changes between neighboring pixels. (𝑃1 ≤ 𝑃2 ) [8] [8]S. Mattoccia, F. Tombari, and L. D. Stefano. “Stereo vision enabling precise border localization within a scanline optimization framework.” In Proc. ACCV, pages 517–527, 2007.

28 Scanline Optimization
The final cost : The disparity with the minimum 𝐶2 value is selected as pixel p’s intermediate result. 𝐶 2 𝐶 𝑟

29 Algorithmn AD-Census Cost Initialization Cross-based Cost Aggregation
Multi-step Disparity Refinement Scanline Optimization Cross-based Cost Aggregation AD-Census Cost Initialization Input: Stereo images Output: Disparity map

30 Multi-step Disparity Refinement
Outlier Handling Outlier Detection Iterative Region Voting Proper Interpolation Depth Discontinuity Adjustment Sub-pixel Enhancement

31 Outlier Handling--Detection
The outliers : 𝐷𝐿(p) != 𝐷R(p − (𝐷𝐿(p), 0)) Outliers are further classified into occlusion and mismatch points p intersect its epipolar line and 𝐷R is checked If no intersection p is labelled as “occlusion”, otherwise “mismatch”

32 Outlier Handling--Iterative Region Voting
Construct cross-based regions and a robust voting scheme Sp : 𝜏𝑆, 𝜏𝐻 : threshold values 5 iterations d d

33 Outlier Handling--Proper Interpolation
occlusion The pixel with the lowest disparity value is selected for interpolation It’s most likely comes from the background mismatch points The pixel with the most similar color is selected for interpolation.

34 Depth Discontinuity Adjustment
For each pixel p on the disparity edge, two pixels p1, p2 from both sides of the edge are collected. 𝐷𝐿(p) is replaced by 𝐷𝐿(p1) or 𝐷𝐿(p2) if one of the two pixels has smaller matching cost than 𝐶2(p,𝐷𝐿(p)). 𝐷𝐿(P1) 𝐷𝐿(P) 𝐷𝐿(P2)

35 Sub-pixel Enhancement[20]
Quadratic polynomial interpolation With 3*3 median filter [20] Q. Yang, L. Wang, R. Yang, H. Stewenius, and D. Nister. “Stereo matching with color-weighted correlation, hierarchical belief propagation and occlusion handling.” IEEE TPAMI, 2009.

36 Multi-step Disparity Refinement
The average error percentages after performing each refinement step.

37 CUDA Implementation

38 CUDA Implementation Compute Unified Device Architecture (CUDA) is a programming interface for parallel computation tasks on NVIDIA graphics hardware. The computation task is coded into a kernel function. The allocation of the threads is controlled with two hierarchical concepts: grid and block. A kernel creates a grid with multiple blocks, and each block consists of multiple threads. Kernel Grid Block Thread

39 CUDA Implementation Cost Initialization:
Parallelize with 𝑊 × 𝐻 threads. Organize into a 2D grid and the block size is set to 32× 32. Each thread computes a cost value for a pixel at a given disparity. For census transform, a square window is require for each pixel, which requires loading more data into the shared memory for fast access. Kernel Grid Block Thread 32X32

40 CUDA Implementation Cross-based Cost Aggregation:
A grid with 𝑊 × 𝐻 threads. Cross construction : block size is 𝑊 or 𝐻 to efficiently handle a scan line Cost aggregation : block size is 32X32 Data reuse with shared memory is considered in both steps.

41 CUDA Implementation Scanline Optimization: Disparity Refinement:
This step is different, because the process is sequential in the scanline direction and parallel in the orthogonal direction. 𝑊 × 𝐷 or 𝐻 × 𝐷 threads Disparity Refinement: 𝑊 × 𝐻 threads

42 Experimental Results

43 Experimental Results Device : A PC with Core 2 Duo 2.20GHz CPU and NVIDIA GeForce GTX 480 graphics card Settings parameters: Source : Middlebury HHI database(book arrival) Microsofy i2i database(Ilkay)

44 Experimental Results Tsukuba Venus Teddy Cones CPU 2.5 4.5 15 GPU 0.015 0.032 0.095 0.094 The GPU-friendly system brings an impressive 140× speedup. The average proportions of the GPU running time for the four computation steps are 1%, 70%, 28% and 1% respectively. The iterative cost aggregation step and the scanline optimization process dominate the running time.

45 Experimental Results First row: disparity maps generated with our system. Second row: disparity error maps with threshold 1. Errors in unoccluded and occluded regions are marked in black and gray respectively.

46 Experimental Results

47 Experimental Results video

48 Snapshots on ’book arrival’ stereo video
Experimental Results Snapshots on ’book arrival’ stereo video

49 Experimental Results Snapshots on ’Ilkay’ stereo video

50 Conclusion

51 Conclusion Contributions Future works
Present a near real-time stereo system with accurate disparity results. Combine some known techniques without sacrificing performance and parallelism to obtain the high quality disparity map. Future works Improve to apply it in real world applications Robust parameter setting methods

52

53

54

55


Download ppt "On Building an Accurate Stereo Matching System on Graphics Hardware"

Similar presentations


Ads by Google