Summary of “Efficient Deep Learning for Stereo Matching”

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

ImageNet Classification with Deep Convolutional Neural Networks
Stereo Matching Segment-based Belief Propagation Iolanthe II racing in Waitemata Harbour.
黃文中 Preview 2 3 The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 4.
Learning Convolutional Feature Hierarchies for Visual Recognition
Boundary matting for view synthesis Samuel W. Hasinoff Sing Bing Kang Richard Szeliski Computer Vision and Image Understanding 103 (2006) 22–32.
Simple Neural Nets For Pattern Classification
Last Time Pinhole camera model, projection
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
The plan for today Camera matrix
Stereo Computation using Iterative Graph-Cuts
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
A Local Adaptive Approach for Dense Stereo Matching in Architectural Scene Reconstruction C. Stentoumis 1, L. Grammatikopoulos 2, I. Kalisperakis 2, E.
Takuya Matsuo, Norishige Fukushima and Yutaka Ishibashi
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
CS 4487/6587 Algorithms for Image Analysis
Feature-Based Stereo Matching Using Graph Cuts Gorkem Saygili, Laurens van der Maaten, Emile A. Hendriks ASCI Conference 2011.
A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.
1 University of Texas at Austin Machine Learning Group 图像与视频处理 计算机学院 Motion Detection and Estimation.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Machine Learning Supervised Learning Classification and Regression
Deeply-Recursive Convolutional Network for Image Super-Resolution
Neural networks and support vector machines
Convolutional Sequence to Sequence Learning
Learning to Compare Image Patches via Convolutional Neural Networks
Deep Feedforward Networks
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
Neural Networks Winter-Spring 2014
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Semi-Global Matching with self-adjusting penalties
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Jure Zbontar, Yann LeCun
A Neural Approach to Blind Motion Deblurring
Semi-Global Stereo Matching with Surface Orientation Priors
Inception and Residual Architecture in Deep Convolutional Networks
Efficient Deep Model for Monocular Road Segmentation
A brief introduction to neural network
Machine Learning Today: Reading: Maria Florina Balcan
Computer Vision James Hays
Face Recognition with Deep Learning Method
Finite Element Surface-Based Stereo 3D Reconstruction
Object Classification through Deconvolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
LECTURE 33: Alternative OPTIMIZERS
Image and Video Processing
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Introduction to Neural Networks
Deep Object Co-Segmentation
“Traditional” image segmentation
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Random Neural Network Texture Model
Computing the Stereo Matching Cost with a Convolutional Neural Network
Outline Announcement Neural networks Perceptrons - continued
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Summary of “Efficient Deep Learning for Stereo Matching” Xi Mo 4/3/2017

Traditional way of stereo matching x y xl xr x y z Disparity=xl-xr Disparity space

Benchmark of Middlebury

Left view Right view Disparity map of left view 3DMST(Rank 1) PMSC(Rank 3) Error Map Error Map

Estimating depth from a stereo pair is still an open problem Estimating depth from a stereo pair is still an open problem. Dealing with occlusions, large saturated areas and repetitive patterns are some of the remaining challenges. Semi-global block matching Markov random field based methods hand crafted cost functions only a linear combination of features is learned Deep Representations are learned directly from pixels to solve many scene understanding tasks. 3DMST(Rank 1) LPU(Rank 13) 3DMST error map LPU error map

Author’s design matching score Teach the Neural Network to learn the match. Siamese architecture Each consisting of a spatial convolution with a small filter-size Followed by a spatial batch normalization and a rectified linear unit (ReLU). Author’s modification remove the ReLU from the last layer in order to not loose the information encoded in the negative values. A 4-layer network with filter-size 3 × 3, which results in a receptive field of size 9 × 9.

Training Process Minimize cross-entropy loss with respect to the weights w that parameterize the network: (xi; yi) - image coordinates of the center of the patch extracted at random from the left image. dxi;yi - be the corresponding ground truth disparity. - a smooth target distribution centered around the ground-truth . Train network using stochastic gradient descent back propagation with AdaGrad (adapts the gradient based on historical information)

Smoothing Deep Net Outputs Cost aggregation: simply performs average pooling over a window of size 5×5. Semi global block matching: Block matching augments the unary energy term obtained from convolutional neural nets by introducing additional pairwise potentials which encourage smooth disparities. and : refers to 4-connected grid and the unary energy : pairwise energy : The output of the neural net

Slanted plane: Over-segment the image using an extension of the SLIC energy. For each super pixel computes slanted plane estimates. Iterate these two steps to minimize an energy function which takes into account appearance, location, disparity, smoothness and boundary energies. Sophisticated post-processing: Interpolation step resolves conflicts between the disparity maps computed for the left and right images by performing a left-right consistency check. Subpixel enhancement fits a quadratic function to neighboring points to obtain an enhanced depth-map. The final refinement step applies a median filter and a bilateral filter. Does not always works

Experimental Evaluation Before training we normalize each image to have zero mean and standard deviation of one. Initialize the parameters of our networks using a uniform distribution. employ the AdaGrad algorithm and use a learning rate of 1e-2. The learning rate is decreased by a factor of 5 after 24k iterations and then further decreased by a factor of 5 every 8k iterations. Use a batch size of 128. The network is trained for 40k iterations which takes around 6.5 hours on an NVIDIA Titan-X.

KITTI 2012 Benchmark original image stereo estimates stereo errors

Experimental Evaluation KITTI 2012 Benchmark The percentage of pixels with disparity errors larger than a fixed threshold

Experimental Evaluation KITTI 2015 Benchmark “Again, our network outperforms previously designed convolutional neural networks by a large margin on all criteria.”