Machine learning overview

Slides:

Advertisements

Similar presentations

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Advertisements

Pattern Recognition and Machine Learning

Support Vector Machines and Margins

The loss function, the normal equation,

x – independent variable (input)

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Part I: Classification and Bayesian Learning

Modeling the probability of a binary outcome

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

CSC321: Lecture 7:Ways to prevent overfitting

Data Mining and Decision Support

Machine Learning 5. Parametric Methods.

Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

Machine Learning Basics （ 1/2 ）周岚. Machine Learning Basics What do we mean by learning? Mitchell (1997) : A computer program is said to learn from experience.

Big data classification using neural network

DEEP LEARNING BOOK CHAPTER to CHAPTER 6

Evaluating Classifiers

Deep Learning Amin Sobhani.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Recurrent Neural Networks for Natural Language Processing

Classification: Logistic Regression

Goodfellow: Chap 5 Machine Learning Basics

CSE 4705 Artificial Intelligence

CH. 2: Supervised Learning

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Machine Learning Basics

Data Mining Lecture 11.

Machine Learning – Regression David Fenyő

Probabilistic Models for Linear Regression

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Learning with information of features

CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off

Collaborative Filtering Matrix Factorization Approach

INF 5860 Machine learning for image classification

Computational Learning Theory

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Computational Learning Theory

Deep Learning for Non-Linear Control

Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:

Overfitting and Underfitting

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Autoencoders hi shea autoencoders Sys-AI.

Machine learning overview

Artificial Intelligence 10. Neural Networks

Neural networks (3) Regularization Autoencoder

Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.

Supervised machine learning: creating a model

Introduction to Neural Networks

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

INTRODUCTION TO Machine Learning 3rd Edition

Introduction to Machine learning

Machine Learning.

Presentation transcript:

Machine learning overview Computational method to improve performance on a task by using training data. This shows a NN, but can replace with other ML methods

Task (prediction from input) Possible Tasks Classification – identify one of several categories, possible with some inputs missing Regression/approximation – approximate a real-valued function of specified inputs Generation – produce output of specified type as a function of input (e.g., a description from a scene or a scene from a description) Image processing – denoising, inpainting, superresolution Many others

Performance (loss function) Correct measure depends on the task Classification yields a probability distribution. Use cross- entropy to compare to true distribution for training. We really care about accuracy (fraction of correct classifications out of total), but difficult to train on that. Image comparison often uses mean squared error - sum of squared differences between pixels, divided by number of pixels. Problem is that both good and bad images can have the same distance from a target image – can lead to blurry output or artifacts. May need a metric that weights more heavily in some directions than others.

Datasets and learning type Supervised learning: training data is pairs of inputs and targets. The goal is to learn the mapping from input to target. E.g., noisy image to clean image or sentence in English to translation in French. Unsupervised learning: training data is just a set of inputs. The goal is to learn something about the distribution of these inputs in the underlying space. E.g., learn a low dimensional representation for a set of natural images, as in an autoencoder.

Linear regression Task: predict y from x, using or Other forms possible, such as This is still linear in the parameters w.

Linear regression Divide data into training set and testing set. Performance is MSE: This is a function of w. The gradient of this function wrt w gives the direction (vector) in w-space that gives the maximum increase. Take a step in the negative gradient direction to decrease this function: gradient descent

Linear regression (and ML in general) Goal: perform well on test data using only the training data Assumption: test data and training data are drawn from the same distribution (i.i.d.: independent, identically distributed samples) Refined goal: Make training error small Make difference between training error and test error small. General strategy: decompose error into two pieces and study each piece separately.

Capacity and data fitting Capacity: a measure of the ability to fit complex data Increased capacity means we can make the training error small. Overfitting: like memorizing the training inputs. Capacity large enough to reproduce training data, but does poorly on test data. Too much capacity for the available data. Underfitting: like ignoring details. Not enough capacity for the available detail.

Capacity and data fitting

Capacity and generalization error

Regularization Sometimes minimizing the performance or loss function directly promotes overfitting. E.g., the Runge phenomenon of interpolating by a polynomial using evenly spaced points. Red = target function Blue = degree 5 Green = degree 9 Output is linear in the coeffs

Regularization Can get a better fit by using a penalty on the coefficients. E.g.,