Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Optimization Tutorial
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Computer vision: models, learning and inference
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
Online Learning Algorithms
Collaborative Filtering Matrix Factorization Approach
Mathematical formulation XIAO LIYING. Mathematical formulation.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
CS 189 Brian Chu Slides at: brianchu.com/ml/
M Machine Learning F# and Accord.net.
Logistic Regression William Cohen.
Machine Learning 5. Parametric Methods.
Artificiel Neural Networks 3 Tricks for improved learning Morten Nielsen Department of Systems Biology, DTU.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Neural networks and support vector machines
Matt Gormley Lecture 11 October 5, 2016
Convolutional Neural Network
CS : Designing, Visualizing and Understanding Deep Neural Networks
Dhruv Batra Georgia Tech
Empirical risk minimization
One-layer neural networks Approximation problems
Bounding the error of misclassification
Lecture 24: Convolutional neural networks
Multimodal Learning with Deep Boltzmann Machines
Lecture 25: Backprop and convnets
Probabilistic Models for Linear Regression
Neural Networks and Backpropagation
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Collaborative Filtering Matrix Factorization Approach
CSCI B609: “Foundations of Data Science”
CS 4501: Introduction to Computer Vision Training Neural Networks II
Logistic Regression & Parallel SGD
Large Scale Support Vector Machines
Convolutional networks
Neural Networks Geoff Hulten.
Deep Learning for Non-Linear Control
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Softmax Classifier.
Empirical risk minimization
Artificial Intelligence 10. Neural Networks
Image Classification & Training of Neural Networks
Logistic Regression Chapter 7.
Linear Discrimination
Introduction to Neural Networks
Overall Introduction for the Lecture
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Today Optimization Today and Monday: Neural nets, CNNs – Mon: Mon: – Wed: Wed: – Today: Today:

Summary

Other loss functions Scores are not very intuitive Softmax classifier – Score function is same – Intuitive output: normalized class probabilities – Extension of logistic regression to multiple classes

Softmax classifier Interpretation: squashes values into range 0 to 1

Cross-entropy loss

Aside: Loss function interpretation Probability – Maximum Likelihood Estimation (MLE) – Regularization is Maximum a posteriori (MAP) estimation Cross-entropy H – p is true distribution (1 for the correct class), q is estimated – Softmax classifier minimizes cross-entropy – Minimizes the KL divergence (Kullback-Leibler) between the distribution: distance between p and q

SVM vs. Softmax

Summary Have score function and loss function – Will generalize the score function Find W and b to minimize loss – SVM vs. Softmax Comparable in performance SVM satisfies margins, softmax optimizes probabilities

Gradient Descent

Step size: learning rate Too big: will miss the minimum Too small: slow convergence

Analytic Gradient

Gradient Descent

Mini-batch Gradient Descent

Stochastic Gradient Descent (SGD)

Summary

Where are we? Classifiers: SVM vs. Softmax Gradient descent to optimize loss functions – Batch gradient descent, stochastic gradient descent – Momentum – Numerical gradients (slow, approximate), analytic gradients (fast, error-prone)

Derivatives Given f(x), where x is vector of inputs – Compute gradient of f at x:

Examples

Backprop gradients Can create complex stages, but easily compute gradient