Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Visual Recognition Tutorial
Classification and risk prediction
Bayesian Learning, Part 1 of (probably) 4 Reading: Bishop Ch. 1.2, 1.5, 2.3.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Bayesian learning finalized (with high probability)
Machine Learning CMPT 726 Simon Fraser University
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
NN Cont’d. Administrivia No news today... Homework not back yet Working on it... Solution set out today, though.
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
AP Statistics Chapter 9 Notes.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
CS6825: Probability Distributions An Introduction.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Starter The length of leaves on a tree are normally distributed with a mean of 14cm and a standard deviation of 4cm. Find the probability that a leaf is:
Plan for today: Chapter 13: Normal distribution. Normal Distribution.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
MATH 110 Sec 14-4 Lecture: The Normal Distribution The normal distribution describes many real-life data sets.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
MATH 1107 Elementary Statistics Lecture 5 The Standard Deviation as a Ruler.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Applied statistics Usman Roshan.
Lecture 2. Bayesian Decision Theory
Statistical Methods Michael J. Watts
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
Statistical Methods Michael J. Watts
Special Topics In Scientific Computing
Latent Variables, Mixture Models and EM
Lecture 26: Faces and probabilities
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Algebra 1/4/17
12/1/2018 Normal Distributions
Logistic Regression & Parallel SGD
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Summary (Week 1) Categorical vs. Quantitative Variables
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Bayesianness, cont’d Part 2 of... 4?

Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?

Bayesian class: general idea Find probability distribution that describes classes of data Find decision surface in terms of those probability distributions Bayesian decision rule: Bayes optimality Want to pick the class that minimizes expected cost Simplest case: cost==misclassification Expected cost == expected misclassification rate

5 minutes of math For 0/1 cost, reduces to: To minimize, pick the that minimizes:

Bayes optimal decisions Final rule: for 0/1 loss (accuracy) optimal decision rule is: Equivalently, it’s sometimes useful to use log odds ratio test:

Bayesian learning process So where do the probability distributions come from? The art of Bayesian data modeling is: Deciding what probability models to use Figuring out how to find the parameters In Bayesian learning, the “learning” is (almost) all in finding the parameters

Back to the H/W data

Gaussian (a.k.a. normal or bell curve) is a reasonable assumption for this data Other distributions better for other data Can make reasonable guesses about means Probably not -3 kg or 2 million lightyears Assumptions like these are called Model assumptions (Gaussian) Parameter priors (means) How do we incorporate these into learning? Prior knowledge

5 minutes of math... Our friend the Gaussian distribution 1n 1-dimension: Mean: Std deviation: Both parameters scalar Usually, we talk about variance rather than std dev:

Gaussian: the pretty picture

Location parameter: μ

Gaussian: the pretty picture Scale parameter: σ

5 minutes of math... In d dimensions: Where: Mean vector: Covariance matrix: Determinant of covariance:

Exercise: For the 1-d Gaussian: Given two classes, with means μ 1 and μ 2 and std devs σ 1 and σ 2 Find a description of the decision point if the std devs are the same, but diff means And if means are the same, but std devs are diff For the d -dim Gaussian, What shapes are the isopotentials? Why? Repeat above exercise for d -dim Gaussian