Machine Learning Intro iCAMP 2012

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Bellwork If you roll a die, what is the probability that you roll a 2 or an odd number? P(2 or odd) 2. Is this an example of mutually exclusive, overlapping,
Set Up Instructions Place a question in each spot indicated Place an answer in each spot indicated Remove this slide Save as a powerpoint slide show.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
ADDING INTEGERS 1. POS. + POS. = POS. 2. NEG. + NEG. = NEG. 3. POS. + NEG. OR NEG. + POS. SUBTRACT TAKE SIGN OF BIGGER ABSOLUTE VALUE.
MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 10 second questions Numbers and number system Numbers and the number system, fractions, decimals, proportion & probability.
Photo Composition Study Guide Label each photo with the category that applies to that image.
The blue and green colors are actually the same.
Machine Learning Math Essentials Part 2
ABC Technology Project
TV Show Trivia Princeton Puzzle Hunt Do NOT flip over your answer sheet!
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
O X Click on Number next to person for a question.
© S Haughton more than 3?
Text Categorization.
Twenty Questions Subject: Twenty Questions
Nonparametric Methods: Nearest Neighbors
Machine Learning: Intro and Supervised Classification
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
This, that, these, those Number your paper from 1-10.
James Hays and Alexei A. Efros Carnegie Mellon University CVPR IM2GPS: estimating geographic information from a single image Wen-Tsai Huang.
Event 4: Mental Math 7th/8th grade Math Meet ‘11.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
1 Unit 1 Kinematics Chapter 1 Day
O X Click on Number next to person for a question.
2009 Quinín Freire 1 THE MAGIC OF LEARNING Where do animals live?
Basics of Statistical Estimation
Classification Classification Examples
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
An Overview of Machine Learning
Machine Learning and the Big Data Challenge Max Welling UC Irvine 1.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Learning from Observations Chapter 18 Section 1 – 4.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Instructor: Max Welling
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Machine Learning ICS 273A Instructor: Max Welling.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Learning from observations
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Collaborative Filtering Nearest Neighbor Approach
Instructor: Max Welling
Semi-Supervised Learning
Presentation transcript:

Machine Learning Intro iCAMP 2012 Max Welling UC Irvine

Machine Learning Algorithms that learn to make predictions from examples (data)

Types of Machine Learning Supervised Learning Labels are provided, there is a strong learning signal. e.g. classification, regression. Semi-supervised Learning. Only part of the data have labels. e.g. a child growing up. Reinforcement learning. The learning signal is a (scalar) reward and may come with a delay. e.g. trying to learn to play chess, a mouse in a maze. Unsupervised learning There is no direct learning signal. We are simply trying to find structure in data. e.g. clustering, dimensionality reduction.

Unsupervised Learning: Dimensionality Reduction: clustering (LLE – Roweis & Saul)

Supervised Learning Regression Classification

Collaborative Filtering (Netflix Dataset) movies (+/- 17,770) users (+/- 240,000) total of +/- 400,000,000 nonzero entries (99% sparse) 1 ? 4 ? 1 4

Generalization X* Consider the following regression problem: Predict the real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}. What is the y-value for a new query point X* ? X*

Generalization

Generalization

Generalization which curve is best?

Generalization Ockham’s razor: prefer the simplest hypothesis consistent with data.

Generalization Learning is concerned with accurate prediction of future data, not accurate prediction of training data. Question: Design an algorithm that is perfect at predicting training data.

Learning as Compression Imagine a game where Bob needs to send a dataset to Alice. They are allowed to meet once before they see the data. The agree on a precision level (quantization level). Bob learns a model (red line). Bob sends the model parameters (offset and slant) only once For every datapoint, Bob sends -distance along line (large number) -orthogonal distance from line (small number) (small numbers are cheaper to encode than large numbers)

Generalization learning = compression = abstraction The man who couldn’t forget …

Classification: nearest neighbor Example: Imagine you want to classify versus Data: 100 monkey images and 200 human images with labels what is what. Task: Here is a new image: monkey or human?

1 nearest neighbor Idea: Find the picture in the database which is closest your query image. Check its label. Declare the class of your query image to be the same as that of the closest picture. query closest image

kNN Decision Surface decision curve

Bayes Rule(s) Riddle: Joe goes to the doctor and tells the doctor he has a stiff neck and a rash. The doctor is worried about meningitis and performs a test that is 80% correct, that is, for 80% of the people that have meningitis it will turn out positive. If 1 in 100,000 people have meningitis in the population and 1 in 1000 people will test positive (sick or not sick) what is the probability that Joe has meningitis? Answer: Bayes Rule. P(meningitis | positive test) = P(positive test | meningitis ) P(meningitis) / P(positive test) = 0.8 * 0.00001 / 0.001 = 0.008 < 1%

Naïve Bayes Classifier X1 test result meningitis stiff-neck, rash X2 Y Naïve Bayes Classifier: P(Y|X1,X2) = P(X1|Y) P(X2|Y) P(Y) / PX1, X2) Conditional Independence: P(X1,X2|Y) = P(X1|Y) P(X2|Y)

Bayesian Networks & Graphical Models Main modeling tool for modern machine learning Reasoning over large collections of random variables with intricate relations