Machine learning overview

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

Pattern Recognition and Machine Learning
Support Vector Machines and Margins
The loss function, the normal equation,
x – independent variable (input)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Part I: Classification and Bayesian Learning
Modeling the probability of a binary outcome
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CSC321: Lecture 7:Ways to prevent overfitting
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Basics ( 1/2 ) 周岚. Machine Learning Basics What do we mean by learning? Mitchell (1997) : A computer program is said to learn from experience.
Big data classification using neural network
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
Evaluating Classifiers
Deep Learning Amin Sobhani.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Recurrent Neural Networks for Natural Language Processing
Classification: Logistic Regression
Goodfellow: Chap 5 Machine Learning Basics
CSE 4705 Artificial Intelligence
CH. 2: Supervised Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Machine Learning Basics
Data Mining Lecture 11.
Machine Learning – Regression David Fenyő
Probabilistic Models for Linear Regression
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Learning with information of features
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Collaborative Filtering Matrix Factorization Approach
INF 5860 Machine learning for image classification
Computational Learning Theory
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computational Learning Theory
Deep Learning for Non-Linear Control
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Overfitting and Underfitting
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Autoencoders hi shea autoencoders Sys-AI.
Machine learning overview
Artificial Intelligence 10. Neural Networks
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Introduction.
Supervised machine learning: creating a model
Introduction to Neural Networks
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
INTRODUCTION TO Machine Learning 3rd Edition
Introduction to Machine learning
Machine Learning.
Presentation transcript:

Machine learning overview Computational method to improve performance on a task by using training data. This shows a NN, but can replace with other ML methods

Task (prediction from input) Possible Tasks Classification – identify one of several categories, possible with some inputs missing Regression/approximation – approximate a real-valued function of specified inputs Generation – produce output of specified type as a function of input (e.g., a description from a scene or a scene from a description) Image processing – denoising, inpainting, superresolution Many others

Performance (loss function) Correct measure depends on the task Classification yields a probability distribution. Use cross- entropy to compare to true distribution for training. We really care about accuracy (fraction of correct classifications out of total), but difficult to train on that. Image comparison often uses mean squared error - sum of squared differences between pixels, divided by number of pixels. Problem is that both good and bad images can have the same distance from a target image – can lead to blurry output or artifacts. May need a metric that weights more heavily in some directions than others.

Datasets and learning type Supervised learning: training data is pairs of inputs and targets. The goal is to learn the mapping from input to target. E.g., noisy image to clean image or sentence in English to translation in French. Unsupervised learning: training data is just a set of inputs. The goal is to learn something about the distribution of these inputs in the underlying space. E.g., learn a low dimensional representation for a set of natural images, as in an autoencoder.

Linear regression Task: predict y from x, using or Other forms possible, such as This is still linear in the parameters w.

Linear regression Divide data into training set and testing set. Performance is MSE: This is a function of w. The gradient of this function wrt w gives the direction (vector) in w-space that gives the maximum increase. Take a step in the negative gradient direction to decrease this function: gradient descent

Linear regression (and ML in general) Goal: perform well on test data using only the training data Assumption: test data and training data are drawn from the same distribution (i.i.d.: independent, identically distributed samples) Refined goal: Make training error small Make difference between training error and test error small. General strategy: decompose error into two pieces and study each piece separately.

Capacity and data fitting Capacity: a measure of the ability to fit complex data Increased capacity means we can make the training error small. Overfitting: like memorizing the training inputs. Capacity large enough to reproduce training data, but does poorly on test data. Too much capacity for the available data. Underfitting: like ignoring details. Not enough capacity for the available detail.

Capacity and data fitting

Capacity and generalization error

Regularization Sometimes minimizing the performance or loss function directly promotes overfitting. E.g., the Runge phenomenon of interpolating by a polynomial using evenly spaced points. Red = target function Blue = degree 5 Green = degree 9 Output is linear in the coeffs

Regularization Can get a better fit by using a penalty on the coefficients. E.g.,