Supervised Learning & Classification, part I Reading: DH&S, Ch 1.

Slides:



Advertisements
Similar presentations
Algorithms and applications
Advertisements

Associative Learning Memories -SOLAR_A
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
What is Statistical Modeling
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
x – independent variable (input)
Supervised Learning I, Cont’d Reading: DH&S, Ch
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Statistical Methods Chichang Jou Tamkang University.
Margins, support vectors, and linear programming, oh my! Reading: Bishop, 4.0, 4.1, 7.0, 7.1 Burges tutorial (on class resources page)
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
Decision trees and empirical methodology Sec 4.3,
Planning to learn. Progress report Last time: Transition functions & stochastic outcomes Markov chains MDPs defined Today: Exercise completed Value functions.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Linear Methods, cont’d; SVMs intro. Straw poll Which would you rather do first? Unsupervised learning Clustering Structure of data Scientific discovery.
Supervised Learning & Classification, part I Reading: W&F ch 1.1, 1.2, , 3.2, 3.3, 4.3, 6.1*
The joy of Entropy.
Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.
The joy of Entropy. Administrivia Reminder: HW 1 due next week No other news. No noose is good noose...
Radial Basis Function Networks
Crash Course on Machine Learning
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
A Few Answers Review September 23, 2010
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Pattern Recognition & Machine Learning Debrup Chakraborty
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning CSE 681 CH2 - Supervised Learning.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Learning from observations
CS 445/545 Machine Learning Spring, 2013 See syllabus at
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
CS 445/545 Machine Learning Winter, 2014 See syllabus at
Data Mining and Decision Support
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Introduction to Machine Learning © Roni Rosenfeld,
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
CSE 4705 Artificial Intelligence
Who am I? Work in Probabilistic Machine Learning Like to teach 
Introduction Machine Learning 14/02/2017.
Learning from Data. Learning from Data Learning sensors actuators environment agent ? As an agent interacts with the world, it should learn about its.
School of Computer Science & Engineering
Erich Smith Coleman Platt
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
Data mining and statistical learning, lecture 1b
The Elements of Statistical Learning
Introduction to Data Science Lecture 7 Machine Learning Overview
Classification with Perceptrons Reading:
Data Mining Lecture 11.
AV Autonomous Vehicles.
Machine Learning & Data Science
Classification Slides by Greg Grudic, CSCI 3202 Fall 2007
The goal of machine learning
CS5112: Algorithms and Data Structures for Applications
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Machine learning CS 229 / stats 229
Presentation transcript:

Supervised Learning & Classification, part I Reading: DH&S, Ch 1

Administrivia... Pretest answers back today Today’s lecture notes online after class Apple Keynote, PDF, PowerPoint PDF & PPT auto-converted; may be flakey

Your place in history Yesterday: Course administrivia Fun & fluffy philosophy Today: The basic ML problem Branches of ML: the 20,000 foot view Intro to supervised learning Definitions and stuff

Pretest results: trends Courses dominated by math, stat; followed by algorithms; followed by CS530; followed by AI & CS500 Proficiencies: probability > algorithms > linear algebra μ=56% σ=28%

The basic ML problem “Emphysema” World Supervised f(  )

Our job: Reconstruct f() from observations Knowing f() tells us: Can recognize new (previously unseen) instances Classification or discrimination Hashimoto-Pritzker The basic ML problem f(  ) ???

Our job: Reconstruct f() from observations Knowing f() tells us: Can synthesize new data (e.g., speech or images) Generation The basic ML problem Random source Emphysema f(  )

The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can help us understand the process that generated data Description or analysis Can tell us/find things we never knew Discovery or data mining f(  ) How many clusters (“blobs”) are there? Taxonomy of data? Networks of relationships? Unusual/unexpected things? Most important characteristics?

The basic ML problem Our job: Reconstruct f() from observations Knowing f() tells us: Can help us act or perform better Control Turn left? Turn right? Accelerate? Brake? Don’t ride in the rain?

A brief taxonomy All ML (highly abbreviated ) - have “inputs” - have “outputs” - find “best” f() - have “inputs” - no “outputs” - find “best” f() - have “inputs” - have “controls” - have “reward” - find “best” f() Supervis ed Unsupervis ed Reinforcemen t Learning

A brief taxonomy All ML Supervis ed Unsupervis ed Reinforcemen t Learning (highly abbreviated ) Classificatio n Regression Discrete outputsContinuous outputs

A classic example: digits The post office wants to be able to auto- scan envelopes, recognize addresses, etc ???

Digits to bits 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... Feature vector Digitize (sensors)

Measurements & features The collection of numbers from the sensors:... is called a feature vector, a.k.a., attribute vector measurement vector instance 255, 0, 93, 11, 45, 6...

Written where d is the dimension of the vector Each is drawn from some range E.g., or or Measurements & features

Features (attributes, independent variables) can come in different flavors: Continuous Discrete Categorical or nominal More on features

We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though ( d ≥100,000 not uncommon) The set of all possible instances is the instance space or feature space, More on features

We (almost always) assume that the set of features is fixed & of finite dimension, d Sometimes quite large, though ( d ≥100,000 not uncommon) The set of all possible instances is the instance space or feature space, More on features

Every example comes w/ a class A.k.a., label, prediction, dependent variable, etc. For classification problems, class label is categorical For regression problems, it’s continuous Usually called dependent or regressed variable We’ll write E.g., Classes 255, 255, 127, 35, 0, , 0, 93, 11, 45, 6... “7” “8”

Classes, cont’d The possible values of the class variable is called the class set, class space, or range Book writes indiv classes as Presumably whole class set is: So

A very simple example I. setosaI. versicolorI. virginica Sepal length Sepal width Petal length Petal width Feature space,

A very simple example I. setosaI. versicolorI. virginica Class space,

Training data Set of all available data for learning == training data A.k.a., parameterization set, fitting set, etc. Denoted Can write as a matrix, w/ a corresponding class vector:

Finally, goals Now that we have and, we have a (mostly) well defined job: Find the function that most closely approximates the “true” function The supervised learning problem:

Goals? Key Questions: What candidate functions do we consider? What does “most closely approximates” mean? How do you find the one you’re looking for? How do you know you’ve found the “right” one?