Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Slides:



Advertisements
Similar presentations
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
Linear Models for Classification
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Probability Theory and Parameter Estimation I
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Classification Discriminant Analysis
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
Pattern Recognition and Machine Learning
INTRODUCTION TO Machine Learning
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Test #1 Thursday September 20th
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒

Multiple measurements d inputs/features/attributes: d-variate N instances/observations/examples Multivariate Data

Multivariate Parameters

Parameter Estimation

Estimation of Missing Values What to do if certain instances have missing attributes Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value ◦ Mean imputation: Use the most likely value ◦ Imputation by regression: Predict based on other attributes

Multivariate Normal Distribution

Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) Bivariate: d = 2 Multivariate Normal Distribution

Bivariate Normal Is probability contour plot of the bivariate normal distribution. Its center is given by the mean, and its shape and orientation depend on the covariance matrix.

If x i are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance: If variances are also equal, reduces to Euclidean distance Independent Inputs: Naive Bayes

Parametric Classification If p (x | C i ) ~ N ( μ i, ∑ i ) Discriminant functions

Estimation of Parameters

Different S i Quadratic discriminant

likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5

Shared common sample covariance S Discriminant reduces to which is a linear discriminant Common Covariance Matrix S

Covariances may be arbitrary but shared by both classes.

When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i ) Classify based on weighted Euclidean distance (in s j units) to the nearest mean Diagonal S

variances may be different

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean Each mean can be considered a prototype or template and this is template matching Diagonal S, equal variances

* ? All classes have equal, diagonal covariance matrices of equal variances on both dimensions.

As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) Model Selection

Different cases of the covariance matrices fitted to the same data lead to different boundaries.

Binary features: if x j are independent (Naive Bayes’) the discriminant is linear Discrete Features Estimated parameters

Multinomial (1-of-n j ) features: x j Î {v 1, v 2,..., v n j } if x j are independent Discrete Features

Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space Multivariate Regression