Machine Learning with Discriminative Methods Lecture 00 – Introduction CS 790-134 Spring 2015 Alex Berg.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Machine Learning with Discriminative Methods Lecture 02 – PAC Learning and tail bounds intro CS Spring 2015 Alex Berg.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Longin Jan Latecki Temple University
What is Statistical Modeling
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
CS 589 Information Risk Management 6 February 2007.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Statistical Decision Theory, Bayes Classifier
The Simple Regression Model
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Machine Learning CMPT 726 Simon Fraser University
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Data mining and statistical learning - lecture 13 Separating hyperplane.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning CSE 681 CH2 - Supervised Learning.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Benk Erika Kelemen Zsolt
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Machine Learning with Discriminative Methods Lecture 01 – Of Machine Learning and Loss CS Spring 2015 Alex Berg.
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
Mathematical Models & Optimization?
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CpSc 881: Machine Learning Evaluating Hypotheses.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning Chapter 5. Evaluating Hypotheses
Linear Models for Classification
Machine Learning with Discriminative Methods Lecture 05 – Doing it 1 CS Spring 2015 Alex Berg.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
CS 9633 Machine Learning Support Vector Machines
Usman Roshan CS 675 Machine Learning
Statistical Analysis Chi Square (X2).
Data Mining Lecture 11.
Overview of Supervised Learning
INTRODUCTION TO Machine Learning 3rd Edition
Evaluating Hypotheses
Computer Vision Chapter 4
Why Machine Learning Flood of data
Computing and Statistical Data Analysis Stat 5: Multivariate Methods
Introduction to Statistics for the Social Sciences SBS200 - Lecture Section 001, Fall 2018 Room 150 Harvill Building 10: :50 Mondays, Wednesdays.
Support Vector Machines and Kernels
Lecture 14 Learning Inductive inference
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Introduction to Machine learning
Presentation transcript:

Machine Learning with Discriminative Methods Lecture 00 – Introduction CS Spring 2015 Alex Berg

Today’s lecture Lecture 0 Course overview Some illustrative examples Warm-up homework out, due Thur. Jan 15

Course overview Graduate introduction to machine learning Focus on computation and discriminative methods Class participation, homework (written and Matlab/other), midterm, project, optional final as needed Encouraged to use examples from your own work

Various Definitions for Machine Learning [Tom Mitchell] Study of algorithms that improve their performance, P, at some task, T, with experience, E. [Wikipedia] a scientific discipline that explores the construction and study of algorithms that can learn from data. [Course] Study of how to build/learn functions (programs) to predict something. Data, Formulations, Computation

Very general At a high level, this is like the scientific method: – observe some data – make some hypotheses (choose model) – perform experiments (fit and evaluate model) – (profit!) Machine learning focuses on the mathematical formulations and computation

Problems with learning a function from data (training) data in red Problem: predict y=f(x) Possible f’s shown in blue and green Need something to limit possibilities! (possibilities are the hypothesis space) From Poggio & Smale 2003

Linear From Hastie, Tibshirani, Friedman Book Linear Classifier15 Nearest Neighbor1 Nearest Neighbor K – Number of nearest neighbors Degrees of freedom Error (training) data in wheat/blue Problem: predict y=f(x 1,x 2 )>0 Possibilities for f(x 1,x 2 )=0 shown in black

Learning/fitting is a process… From Raginsky notes Estimating the probability that a tossed coin comes up heads… The i’th coin toss Estimator based on n tosses Estimate is within epsilon Estimate is not within epsilon Probability of being bad is inversely proportional to the number of samples… (the underlying computation is an example of a tail bound) Computation

Bad news for more dimensions Estimating a single variable (e.g. bias of a coin) within a couple of percent might take ~100 samples… Estimating a function (e.g. the probability of being in class 1) for an n-dimensional binary variable requires estimating ~2 n variables. 100x2 n can be large. In most cases our game will be finding ways to restrict the possibilities for the function and to focus on the decision boundary (where f(x)=0) instead of f(x) itself.

Reading Maxim Raginksy’s introduction notes for statistical machine learning: Poggio & Smale “The mathematics of learning: dealing with data”, Notices of the American Mathematical Society, vol. 50, no. 5, pp , Hastie, Tibshirani, Friedman Elements of Statistical Learning (the course textbook) Chapters 1 and 2

Homework See course page for first assignment: Due Thursday January 15 before class. Let’s get to work on the board… - Empirical risk minimization - Loss functions - Estimation and approximation errors