Summary of “Efficient Deep Learning for Stereo Matching”

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

Stereo Matching Segment-based Belief Propagation Iolanthe II racing in Waitemata Harbour.

黃文中 Preview 2 3 The Saliency Map is a topographically arranged map that represents visual saliency of a corresponding visual scene. 4.

Learning Convolutional Feature Hierarchies for Visual Recognition

Boundary matting for view synthesis Samuel W. Hasinoff Sing Bing Kang Richard Szeliski Computer Vision and Image Understanding 103 (2006) 22–32.

Simple Neural Nets For Pattern Classification

Last Time Pinhole camera model, projection

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Contents Description of the big picture Theoretical background on this work The Algorithm Examples.

The plan for today Camera matrix

Stereo Computation using Iterative Graph-Cuts

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.

A Local Adaptive Approach for Dense Stereo Matching in Architectural Scene Reconstruction C. Stentoumis 1, L. Grammatikopoulos 2, I. Kalisperakis 2, E.

Takuya Matsuo, Norishige Fukushima and Yutaka Ishibashi

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

CS 4487/6587 Algorithms for Image Analysis

Feature-Based Stereo Matching Using Graph Cuts Gorkem Saygili, Laurens van der Maaten, Emile A. Hendriks ASCI Conference 2011.

A Region Based Stereo Matching Algorithm Using Cooperative Optimization Zeng-Fu Wang, Zhi-Gang Zheng University of Science and Technology of China Computer.

1 University of Texas at Austin Machine Learning Group 图像与视频处理计算机学院 Motion Detection and Estimation.

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:

Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.

Machine Learning Supervised Learning Classification and Regression

Deeply-Recursive Convolutional Network for Image Super-Resolution

Neural networks and support vector machines

Convolutional Sequence to Sequence Learning

Learning to Compare Image Patches via Convolutional Neural Networks

Deep Feedforward Networks

The Relationship between Deep Learning and Brain Function

Deep Learning Amin Sobhani.

Neural Networks Winter-Spring 2014

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Semi-Global Matching with self-adjusting penalties

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Jure Zbontar, Yann LeCun

A Neural Approach to Blind Motion Deblurring

Semi-Global Stereo Matching with Surface Orientation Priors

Inception and Residual Architecture in Deep Convolutional Networks

Efficient Deep Model for Monocular Road Segmentation

A brief introduction to neural network

Machine Learning Today: Reading: Maria Florina Balcan

Computer Vision James Hays

Face Recognition with Deep Learning Method

Finite Element Surface-Based Stereo 3D Reconstruction

Object Classification through Deconvolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

LECTURE 35: Introduction to EEG Processing

Neural Networks Geoff Hulten.

LECTURE 33: Alternative OPTIMIZERS

Image and Video Processing

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Introduction to Neural Networks

Deep Object Co-Segmentation

“Traditional” image segmentation

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Random Neural Network Texture Model

Computing the Stereo Matching Cost with a Convolutional Neural Network

Outline Announcement Neural networks Perceptrons - continued

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Summary of “Efficient Deep Learning for Stereo Matching” Xi Mo 4/3/2017

Traditional way of stereo matching x y xl xr x y z Disparity=xl-xr Disparity space

Benchmark of Middlebury

Left view Right view Disparity map of left view 3DMST(Rank 1) PMSC(Rank 3) Error Map Error Map

Estimating depth from a stereo pair is still an open problem Estimating depth from a stereo pair is still an open problem. Dealing with occlusions, large saturated areas and repetitive patterns are some of the remaining challenges. Semi-global block matching Markov random field based methods hand crafted cost functions only a linear combination of features is learned Deep Representations are learned directly from pixels to solve many scene understanding tasks. 3DMST(Rank 1) LPU(Rank 13) 3DMST error map LPU error map

Author’s design matching score Teach the Neural Network to learn the match. Siamese architecture Each consisting of a spatial convolution with a small filter-size Followed by a spatial batch normalization and a rectified linear unit (ReLU). Author’s modification remove the ReLU from the last layer in order to not loose the information encoded in the negative values. A 4-layer network with filter-size 3 × 3, which results in a receptive field of size 9 × 9.

Training Process Minimize cross-entropy loss with respect to the weights w that parameterize the network: (xi; yi) - image coordinates of the center of the patch extracted at random from the left image. dxi;yi - be the corresponding ground truth disparity. - a smooth target distribution centered around the ground-truth . Train network using stochastic gradient descent back propagation with AdaGrad (adapts the gradient based on historical information)

Smoothing Deep Net Outputs Cost aggregation: simply performs average pooling over a window of size 5×5. Semi global block matching: Block matching augments the unary energy term obtained from convolutional neural nets by introducing additional pairwise potentials which encourage smooth disparities. and : refers to 4-connected grid and the unary energy : pairwise energy : The output of the neural net

Slanted plane: Over-segment the image using an extension of the SLIC energy. For each super pixel computes slanted plane estimates. Iterate these two steps to minimize an energy function which takes into account appearance, location, disparity, smoothness and boundary energies. Sophisticated post-processing: Interpolation step resolves conflicts between the disparity maps computed for the left and right images by performing a left-right consistency check. Subpixel enhancement fits a quadratic function to neighboring points to obtain an enhanced depth-map. The final refinement step applies a median filter and a bilateral filter. Does not always works

Experimental Evaluation Before training we normalize each image to have zero mean and standard deviation of one. Initialize the parameters of our networks using a uniform distribution. employ the AdaGrad algorithm and use a learning rate of 1e-2. The learning rate is decreased by a factor of 5 after 24k iterations and then further decreased by a factor of 5 every 8k iterations. Use a batch size of 128. The network is trained for 40k iterations which takes around 6.5 hours on an NVIDIA Titan-X.

KITTI 2012 Benchmark original image stereo estimates stereo errors

Experimental Evaluation KITTI 2012 Benchmark The percentage of pixels with disparity errors larger than a fixed threshold

Experimental Evaluation KITTI 2015 Benchmark “Again, our network outperforms previously designed convolutional neural networks by a large margin on all criteria.”