Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Slides:

Advertisements

Similar presentations

Linear Regression.

Advertisements

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Optimization Tutorial

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.

Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Computer vision: models, learning and inference

Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University

Lecture 14 – Neural Networks

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.

Artificial Neural Networks

Online Learning Algorithms

Collaborative Filtering Matrix Factorization Approach

Mathematical formulation XIAO LIYING. Mathematical formulation.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

CS 189 Brian Chu Slides at: brianchu.com/ml/

M Machine Learning F# and Accord.net.

Logistic Regression William Cohen.

Machine Learning 5. Parametric Methods.

Artificiel Neural Networks 3 Tricks for improved learning Morten Nielsen Department of Systems Biology, DTU.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Strong Supervision From Weak Annotation Interactive Training of Deformable Part Models ICCV /05/23.

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

Multinomial Regression and the Softmax Activation Function Gary Cottrell.

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

Neural networks and support vector machines

Matt Gormley Lecture 11 October 5, 2016

Convolutional Neural Network

CS : Designing, Visualizing and Understanding Deep Neural Networks

Dhruv Batra Georgia Tech

Empirical risk minimization

One-layer neural networks Approximation problems

Bounding the error of misclassification

Lecture 24: Convolutional neural networks

Multimodal Learning with Deep Boltzmann Machines

Lecture 25: Backprop and convnets

Probabilistic Models for Linear Regression

Neural Networks and Backpropagation

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Collaborative Filtering Matrix Factorization Approach

CSCI B609: “Foundations of Data Science”

CS 4501: Introduction to Computer Vision Training Neural Networks II

Logistic Regression & Parallel SGD

Large Scale Support Vector Machines

Convolutional networks

Neural Networks Geoff Hulten.

Deep Learning for Non-Linear Control

Neural Networks ICS 273A UC Irvine Instructor: Max Welling

Softmax Classifier.

Empirical risk minimization

Artificial Intelligence 10. Neural Networks

Image Classification & Training of Neural Networks

Logistic Regression Chapter 7.

Linear Discrimination

Introduction to Neural Networks

Overall Introduction for the Lecture

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Today Optimization Today and Monday: Neural nets, CNNs – Mon: Mon: – Wed: Wed: – Today: Today:

Summary

Other loss functions Scores are not very intuitive Softmax classifier – Score function is same – Intuitive output: normalized class probabilities – Extension of logistic regression to multiple classes

Softmax classifier Interpretation: squashes values into range 0 to 1

Cross-entropy loss

Aside: Loss function interpretation Probability – Maximum Likelihood Estimation (MLE) – Regularization is Maximum a posteriori (MAP) estimation Cross-entropy H – p is true distribution (1 for the correct class), q is estimated – Softmax classifier minimizes cross-entropy – Minimizes the KL divergence (Kullback-Leibler) between the distribution: distance between p and q

SVM vs. Softmax

Summary Have score function and loss function – Will generalize the score function Find W and b to minimize loss – SVM vs. Softmax Comparable in performance SVM satisfies margins, softmax optimizes probabilities

Gradient Descent

Step size: learning rate Too big: will miss the minimum Too small: slow convergence

Analytic Gradient

Gradient Descent

Mini-batch Gradient Descent

Stochastic Gradient Descent (SGD)

Summary

Where are we? Classifiers: SVM vs. Softmax Gradient descent to optimize loss functions – Batch gradient descent, stochastic gradient descent – Momentum – Numerical gradients (slow, approximate), analytic gradients (fast, error-prone)

Derivatives Given f(x), where x is vector of inputs – Compute gradient of f at x:

Examples

Backprop gradients Can create complex stages, but easily compute gradient