Data Modeling and Least Squares Fitting 2 COS 323.

Slides:



Advertisements
Similar presentations
Instabilities of SVD Small eigenvalues -> m+ sensitive to small amounts of noise Small eigenvalues maybe indistinguishable from 0 Possible to remove small.
Advertisements

Siddharth Choudhary.  Refines a visual reconstruction to produce jointly optimal 3D structure and viewing parameters  ‘bundle’ refers to the bundle.
CMPUT 466/551 Principal Source: CMU
Chapter 6 Feature-based alignment Advanced Computer Vision.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
x – independent variable (input)
Multiple View Geometry
Motion Analysis (contd.) Slides are from RPI Registration Class.
Curve-Fitting Regression
Fitting. We’ve learned how to detect edges, corners, blobs. Now what? We would like to form a higher-level, more compact representation of the features.
Geometric Optimization Problems in Computer Vision.
Regression III: Robust regressions
Nonlinear Regression Probability and Statistics Boris Gervits.
Fitting a Model to Data Reading: 15.1,
Fitting. Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local –can’t tell whether.
Lecture 8: Image Alignment and RANSAC
Robust Statistics Robust Statistics Why do we use the norms we do? Henrik Aanæs IMM,DTU A good general reference is: Robust Statistics:
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
1cs426-winter-2008 Notes  SIGGRAPH crunch time - my apologies :-)
Lecture 10: Robust fitting CS4670: Computer Vision Noah Snavely.
Random Sample Consensus: A Paradigm for Model Fitting with Application to Image Analysis and Automated Cartography Martin A. Fischler, Robert C. Bolles.
1Jana Kosecka, CS 223b EM and RANSAC EM and RANSAC.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Fitting.
Radial Basis Function Networks
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Chapter 6 Feature-based alignment Advanced Computer Vision.
ANOVA Greg C Elvers.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Nonlinear least squares Given m data points (t i, y i ) i=1,2,…m, we wish to find a vector x of n parameters that gives a best fit in the least squares.
Example: line fitting. n=2 Model fitting Measure distances.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Curve-Fitting Regression
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
EECS 274 Computer Vision Geometric Camera Calibration.
EECS 274 Computer Vision Model Fitting. Fitting Choose a parametric object/some objects to represent a set of points Three main questions: –what object.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
Fitting image transformations Prof. Noah Snavely CS1114
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
Fitting.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Grouping and Segmentation. Sometimes edge detectors find the boundary pretty well.
Deep Feedforward Networks
A Brief Introduction of RANSAC
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
A special case of calibration
Non-linear Least-Squares
Nonlinear regression.
Segmentation by fitting a model: robust estimators and RANSAC
Introduction to Sensor Interpretation
9. Data Fitting Fitting Experimental Spectrum
Introduction to Sensor Interpretation
Calibration and homographies
CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely
Reinforcement Learning (2)
CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely
Probabilistic Surrogate Models
Reinforcement Learning (2)
Presentation transcript:

Data Modeling and Least Squares Fitting 2 COS 323

Nonlinear Least Squares Some problems can be rewritten to linearSome problems can be rewritten to linear Fit data points (x i, log y i ) to a * +bx, a = e a*Fit data points (x i, log y i ) to a * +bx, a = e a* Big problem: this no longer minimizes squared error!Big problem: this no longer minimizes squared error!

Nonlinear Least Squares Can write error function, minimize directlyCan write error function, minimize directly For the exponential, no analytic solution for a, b:For the exponential, no analytic solution for a, b:

Newton’s Method Apply Newton’s method for minimization: where H is Hessian (matrix of all 2 nd derivatives) and G is gradient (vector of all 1 st derivatives)Apply Newton’s method for minimization: where H is Hessian (matrix of all 2 nd derivatives) and G is gradient (vector of all 1 st derivatives)

Newton’s Method for Least Squares Gradient has 1 st derivatives of f, Hessian 2 ndGradient has 1 st derivatives of f, Hessian 2 nd

Gauss-Newton Iteration Consider 1 term of Hessian:Consider 1 term of Hessian: If close to answer, first term close to 0If close to answer, first term close to 0 Gauss-Newtion method: ignore first term!Gauss-Newtion method: ignore first term! – Eliminates requirement to calculate 2 nd derivatives of f – Surprising fact: still superlinear convergence if “close enough” to answer

Levenberg-Marquardt Newton (and Gauss-Newton) work well when close to answer, terribly when far awayNewton (and Gauss-Newton) work well when close to answer, terribly when far away Steepest descent safe when far awaySteepest descent safe when far away Levenberg-Marquardt idea: let’s do bothLevenberg-Marquardt idea: let’s do both Steepest descent Gauss- Newton

Levenberg-Marquardt Trade off between constants depending on how far away you are…Trade off between constants depending on how far away you are… Clever way of doing this:Clever way of doing this: If is small, mostly like Gauss-NewtonIf is small, mostly like Gauss-Newton If is big, matrix becomes mostly diagonal, behaves like steepest descentIf is big, matrix becomes mostly diagonal, behaves like steepest descent

Levenberg-Marquardt Final bit of cleverness: adjust depending on how well we’re doingFinal bit of cleverness: adjust depending on how well we’re doing – Start with some, e.g – If last iteration decreased error, accept the step and decrease to /10 – If last iteration increased error, reject the step and increase to 10 – If last iteration increased error, reject the step and increase to 10 Result: fairly stable algorithm, not too painful (no 2 nd derivatives), used a lotResult: fairly stable algorithm, not too painful (no 2 nd derivatives), used a lot

Outliers A lot of derivations assume Gaussian distribution for errorsA lot of derivations assume Gaussian distribution for errors Unfortunately, nature (and experimenters) sometimes don’t cooperateUnfortunately, nature (and experimenters) sometimes don’t cooperate Outliers: points with extremely low probability of occurrence (according to Gaussian statistics)Outliers: points with extremely low probability of occurrence (according to Gaussian statistics) Can have strong influence on least squaresCan have strong influence on least squares probability Gaussian Non-Gaussian

Robust Estimation Goal: develop parameter estimation methods insensitive to small numbers of large errorsGoal: develop parameter estimation methods insensitive to small numbers of large errors General approach: try to give large deviations less weightGeneral approach: try to give large deviations less weight M-estimators: minimize some function other than square of y – f(x,a,b,…)M-estimators: minimize some function other than square of y – f(x,a,b,…)

Least Absolute Value Fitting Minimize instead ofMinimize instead of Points far away from trend get comparatively less influencePoints far away from trend get comparatively less influence

Example: Constant For constant function y = a, minimizing  (y–a) 2 gave a = meanFor constant function y = a, minimizing  (y–a) 2 gave a = mean Minimizing  |y–a| gives a = medianMinimizing  |y–a| gives a = median

Doing Robust Fitting In general case, nasty function: discontinuous derivativeIn general case, nasty function: discontinuous derivative Simplex method often a good choiceSimplex method often a good choice

Iteratively Reweighted Least Squares Sometimes-used approximation: convert to iterated weighted least squares with w i based on previous iterationSometimes-used approximation: convert to iterated weighted least squares with w i based on previous iteration

Iteratively Reweighted Least Squares Different options for weightsDifferent options for weights – Avoid problems with infinities – Give even less weight to outliers

Iteratively Reweighted Least Squares Danger! This is not guaranteed to converge to the right answer!Danger! This is not guaranteed to converge to the right answer! – Needs good starting point, which is available if initial least squares estimator is reasonable – In general, works OK if few outliers, not too far off

Outlier Detection and Rejection Special case of IRWLS: set weight = 0 if outlier, 1 otherwiseSpecial case of IRWLS: set weight = 0 if outlier, 1 otherwise Detecting outliers: ( y i –f(x i ) ) 2 > thresholdDetecting outliers: ( y i –f(x i ) ) 2 > threshold – One choice: multiple of mean squared difference – Better choice: multiple of median squared difference – Can iterate… – As before, not guaranteed to do anything reasonable, tends to work OK if only a few outliers

RANSAC RANdom SAmple Consensus: desgined for bad data (in best case, up to 50% outliers)RANdom SAmple Consensus: desgined for bad data (in best case, up to 50% outliers) Take many random subsets of dataTake many random subsets of data – Compute least squares fit for each sample – See how many points agree: ( y i –f(x i ) ) 2 < threshold – Threshold user-specified or estimated from more trials At end, use fit that agreed with most pointsAt end, use fit that agreed with most points – Can do one final least squares with all inliers