Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Slides:

Advertisements

Similar presentations

Principles of Density Estimation

Advertisements

Machine Learning Intro iCAMP 2012

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.

Nonparametric Methods: Nearest Neighbors

ECG Signal processing (2)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.

Evaluating Classifiers

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

An Introduction of Support Vector Machine

INTRODUCTION TO Machine Learning 3rd Edition

Machine Learning and the Big Data Challenge Max Welling UC Irvine 1.

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},

1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.

Instance Based Learning

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Sparse vs. Ensemble Approaches to Supervised Learning

Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.

Learning from Observations Chapter 18 Section 1 – 4.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.

Instance Based Learning

Instructor: Max Welling

Machine Learning ICS 273A Instructor: Max Welling.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

INSTANCE-BASE LEARNING

Sparse vs. Ensemble Approaches to Supervised Learning

Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Module 04: Algorithms Topic 07: Instance-Based Learning

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

CS Machine Learning Instance Based Learning (Adapted from various sources)

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.

KNN & Naïve Bayes Hongning Wang

Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

k-Nearest neighbors and decision tree

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Goodfellow: Chap 5 Machine Learning Basics

Ch8: Nonparametric Methods

ECE 5424: Introduction to Machine Learning

K Nearest Neighbor Classification

Statistical Learning Dong Liu Dept. EEIS, USTC.

Instructor: Max Welling

Machine Learning: UNIT-4 CHAPTER-1

Nearest Neighbors CSC 576: Data Mining.

Machine learning overview

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Presentation transcript:

Machine Learning ICS 178 Instructor: Max Welling Supervised Learning

Supervised Learning I Example: Imagine you want to classify versus Data: 100 monkey images and 200 human images with labels what is what. where x represents the greyscale of the image pixels and y=0 means “monkey” while y=1 means “human”. Task: Here is a new image: monkey or human?

1 nearest neighbors (your first ML algorithm!) Idea: 1.Find the picture in the database which is closest to your query image. 2.Check its label. 3.Declare the class of your query image to be the same as that of the closest picture. query closest image

1NN Decision Surface decision curve

Distance Metric How do we measure what it means to be “close”? Depending on the problem we should choose an appropriate “distance” metric (or more generally, a (dis)similarity measure) -Demo: -Matlab Demo.

Remarks on NN methods We only need to construct a classifier that works locally for each query. Hence: We don’t need to construct a classifier everywhere in space. Classifying is done at query time. This can be computationally taxing at a time where you might want to be fast. Memory inefficient. Curse of dimensionality: imagine many features are irrelevant / noisy  distances are always large. Very flexible, not many prior assumptions. k-NN variants robust against “bad examples”.

Non-parametric Methods Non-parametric methods keep all the data cases/examples in memory. A better name is: “instance-based” learning As the data-set grows, the complexity of the decision surface grows. Sometimes, non-parametric methods have some parameters to tune... “Assumption light”

In unsupervised learning we are concerned with finding the structure in the data. For instance, we may want to learn the probability distribution from which the data was sampled. Data now looks like (no labels): A nonparametric estimate of the probability distribution can be obtained through Parzen-windowing (or kernel estimation). Unsupervised learning

First define a “kernel”. This is some function that “smoothes” a data-point and satisfies: We estimate the distribution as: Example: Gaussian kernel (RBF kernel) When you add these up with weights 1/N you get: Parzen Windowing

Example: Distribution of Grades

Overfitting Fundamental issue with Parzen windowing: How do you choose the width of the kernel ? To broad: you loose the details, to small, you see too much detail. Imagine you left out part of the data-points when you produced your estimate for the distribution. A good fit is one that would have high probability for the left out data-points. So you can tune on the left out data! This is called (cross) validation. This issue is fundamental to machine learning: how much fitting is just right.

Overfitting in a Picture error on hold-out data error on training data

Generalization Consider the following regression problem: Predict the real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}. What is the y-value for a new query point X* ? X*

Generalization

which curve is best?

Ockham’s razor: prefer the simplest hypothesis consistent with data. Generalization

Your Take-Home Message Learning is concerned with accurate prediction of future data, not accurate prediction of training data. (The single most important sentence you will see in the course)

Homework Read chapter 4, pages , Download netflix data and plot: – ratings as a function of time. – variance in ratings as a function of the mean of movie ratings.