ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Probability Review –Statistical Estimation (MLE) Readings: Barber 8.1, 8.2.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Lecture 4A: Probability Theory Review Advanced Artificial Intelligence.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Scores and substitution matrices in sequence alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 11 th,
Review of Basic Probability and Statistics
Short review of probabilistic concepts
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction Event Algebra (Sec )
Machine Learning CMPT 726 Simon Fraser University
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )
Probability Review Thursday Sep 13.
Probability and Statistics Review Thursday Sep 11.
Basic Concepts and Approaches
Crash Course on Machine Learning
Recitation 1 Probability Review
General information CSE : Probabilistic Analysis of Computer Systems
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
LECTURE IV Random Variables and Probability Distributions I.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Making sense of randomness
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)
Uniovi1 Some statistics books, papers, etc. G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 see also R.J. Barlow,
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.
Probability (outcome k) = Relative Frequency of k
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Deconvolution.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Lecture 1.31 Criteria for optimal reception of radio signals.
Probabilistic Analysis of Computer Systems
Review of Probability.
ECE 5424: Introduction to Machine Learning
Probability Theory and Parameter Estimation I
Probability for Machine Learning
ECE 5424: Introduction to Machine Learning
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
CS 2750: Machine Learning Probability Review Density Estimation
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
CSE-490DF Robotics Capstone
Advanced Artificial Intelligence
Statistical NLP: Lecture 4
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Probability Review –Statistical Estimation (MLE) Readings: Barber 8.1, 8.2

Administrativia HW1 –Due on Sun 02/15, 11:55pm – (C) Dhruv Batra2

Project Groups of 1-3 –we prefer teams of 2 Deliverables: –Project proposal (NIPS format): 2 page, due Feb 24 –Midway presentations (in class) –Final report: webpage with results (C) Dhruv Batra3

Proposal 2 Page (NIPS format) – Necessary Information: –Project title –Project idea. This should be approximately two paragraphs. –Data set details Ideally existing dataset. No data-collection projects. –Software Which libraries will you use? What will you write? –Papers to read. Include 1-3 relevant papers. You will probably want to read at least one of them before submitting your proposal. –Teammate Will you have a teammate? If so, what’s the break-down of labor? Maximum team size is 3 students. –Mid-sem Milestone What will you complete by the project milestone due date? Experimental results of some kind are expected here. (C) Dhruv Batra4

Project Rules –Must be about machine learning –Must involve real data Use your own data or take from class website –Can apply ML to your own research. Must be done this semester. –OK to combine with other class-projects Must declare to both course instructors Must have explicit permission from BOTH instructors Must have a sufficient ML component –Using libraries No need to implement all algorithms OK to use standard SVM, MRF, Decision-Trees, etc libraries More thought+effort => More credit (C) Dhruv Batra5

Project Main categories –Application/Survey Compare a bunch of existing algorithms on a new application domain of your interest –Formulation/Development Formulate a new model or algorithm for a new or old problem –Theory Theoretically analyze an existing algorithm Support –List of ideas, pointers to dataset/algorithms/code We will mentor teams and give feedback. (C) Dhruv Batra6

Administrativia HW1 –Due on Sun 02/15, 11:55pm – Project Proposal –Due: Tue 02/24, 11:55 pm –<=2pages, NIPS format (C) Dhruv Batra7

Procedural View Training Stage: –Raw Data  x (Feature Extraction) –Training Data { (x,y) }  f (Learning) Testing Stage –Raw Data  x (Feature Extraction) –Test Data x  f(x) (Apply function, Evaluate error) (C) Dhruv Batra8

Statistical Estimation View Probabilities to rescue: –x and y are random variables –D = (x 1,y 1 ), (x 2,y 2 ), …, (x N,y N ) ~ P(X,Y) IID: Independent Identically Distributed –Both training & testing data sampled IID from P(X,Y) –Learn on training set –Have some hope of generalizing to test set (C) Dhruv Batra9

Plan for Today Review of Probability –Discrete vs Continuous Random Variables –PMFs vs PDF –Joint vs Marginal vs Conditional Distributions –Bayes Rule and Prior Statistical Learning / Density Estimation –Maximum Likelihood –Maximum A Posteriori –Bayesian Estimation We will discuss simple examples (like coin toss), but these SAME concepts will apply to sophisticated problems. (C) Dhruv Batra10

Probability The world is a very uncertain place 30 years of Artificial Intelligence and Database research danced around this fact And then a few AI researchers decided to use some ideas from the eighteenth century (C) Dhruv Batra11Slide Credit: Andrew Moore

Probability A is non-deterministic event –Can think of A as a boolean-valued variable Examples –A = your next patient has cancer –A = Rafael Nada wins French Open 2015 (C) Dhruv Batra12

Interpreting Probabilities What does P(A) mean? Frequentist View –limit N  ∞ #(A is true)/N –limiting frequency of a repeating non-deterministic event Bayesian View –P(A) is your “belief” about A Market Design View –P(A) tells you how much you would bet (C) Dhruv Batra13

(C) Dhruv Batra14Image Credit: Intrade / NPR

(C) Dhruv Batra15Slide Credit: Andrew Moore

Axioms of Probability 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) – P(A and B) (C) Dhruv Batra16

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) – P(A and B) (C) Dhruv Batra17Image Credit: Andrew Moore

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) – P(A and B) (C) Dhruv Batra18Image Credit: Andrew Moore

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) – P(A and B) (C) Dhruv Batra19Image Credit: Andrew Moore

Interpreting the Axioms 0<= P(A) <= 1 P(empty-set) = 0 P(everything) = 1 P(A or B) = P(A) + P(B) – P(A and B) (C) Dhruv Batra20Image Credit: Andrew Moore

Concepts Sample Space –Space of events Random Variables –Mapping from events to numbers –Discrete vs Continuous Probability –Mass vs Density (C) Dhruv Batra21

discrete random variable sample space of possible outcomes, which may be finite or countably infinite outcome of sample of discrete random variable probability distribution (probability mass function) shorthand used when no ambiguity (C) Dhruv BatraSlide Credit: Erik Suddherth degenerate distribution uniform distribution 22 or Val(X) Discrete Random Variables

Continuous Random Variables On board (C) Dhruv Batra23

Concepts Expectation Variance (C) Dhruv Batra24

Most Important Concepts Marginal distributions / Marginalization Conditional distribution / Chain Rule Bayes Rule (C) Dhruv Batra25

Joint Distribution (C) Dhruv Batra26

Marginalization –Events: P(A) = P(A and B) + P(A and not B) –Random variables (C) Dhruv Batra27

Marginal Distributions (C) Dhruv BatraSlide Credit: Erik Suddherth28

Conditional Probabilities P(Y=y | X=x) What do you believe about Y=y, if I tell you X=x? P(Rafael Nadal wins French Open 2015)? What if I tell you: –He has won the French Open 9/10 he has played there –Novak Djokovic is ranked 1; just won Australian Open –I offered a similar analysis last year and Nadal won (C) Dhruv Batra29

Conditional Probabilities P(A | B) = In worlds that where B is true, fraction where A is true Example –H: “Have a headache” –F: “Coming down with Flu” (C) Dhruv Batra30

Conditional Distributions (C) Dhruv BatraSlide Credit: Erik Sudderth31

Conditional Probabilities Definition Corollary: Chain Rule (C) Dhruv Batra32

Independent Random Variables (C) Dhruv BatraSlide Credit: Erik Sudderth33

Sets of variables X, Y X is independent of Y –Shorthand: P (X  Y) Proposition: P satisfies (X  Y) if and only if –P(X=x,Y=y) = P(X=x) P(Y=y),  x  Val(X), y  Val(Y) Marginal Independence (C) Dhruv Batra34

Sets of variables X, Y, Z X is independent of Y given Z if –Shorthand: P (X  Y | Z) –For P (X  Y |  ), write P (X  Y) Proposition: P satisfies (X  Y | Z) if and only if –P(X,Y|Z) = P(X|Z) P(Y|Z),  x  Val(X), y  Val(Y), z  Val(Z) Conditional independence (C) Dhruv Batra35

Concept Bayes Rules –Simple yet fundamental (C) Dhruv Batra36Image Credit: Andrew Moore

Bayes Rule Simple yet profound –Using Bayes Rules doesn’t make your analysis Bayesian! Concepts: –Likelihood How much does a certain hypothesis explain the data? –Prior What do you believe before seeing any data? –Posterior What do we believe after seeing the data? (C) Dhruv Batra37

Entropy (C) Dhruv Batra38Slide Credit: Sam Roweis

KL-Divergence / Relative Entropy (C) Dhruv Batra39Slide Credit: Sam Roweis

KL-Divergence / Relative Entropy (C) Dhruv Batra40Image Credit: Wikipedia a

KL-Divergence / Relative Entropy (C) Dhruv Batra41Image Credit: Wikipedia a

End of Prob. Review Start of Estimation (C) Dhruv Batra42