Machine Learning and the Big Data Challenge Max Welling UC Irvine 1.

Slides:



Advertisements
Similar presentations
Machine Learning Intro iCAMP 2012
Advertisements

Godfather to the Singularity
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Data Mining Classification: Alternative Techniques
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
An Overview of Machine Learning
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Learning from Observations Chapter 18 Section 1 – 4.
Instructor: Max Welling
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Machine Learning ICS 273A Instructor: Max Welling.
Unsupervised Learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Part I: Classification and Bayesian Learning
Introduction to machine learning
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Data mining and machine learning A brief introduction.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Learning from observations
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Brief Intro to Machine Learning CS539
SNS COLLEGE OF TECHNOLOGY
Done Done Course Overview What is AI? What are the Major Challenges?
Machine Learning & Data Science
Basic Intro Tutorial on Machine Learning and Data Mining
CSCI 5822 Probabilistic Models of Human and Machine Learning
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ensembles.
Instructor: Max Welling
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Information Retrieval
Lecture 21: Machine Learning Overview AP Computer Science Principles
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Semi-Supervised Learning
Machine Learning overview Chapter 18, 21
Instructor: Vincent Conitzer
Lecture 9: Machine Learning Overview AP Computer Science Principles
What is Artificial Intelligence?
Presentation transcript:

Machine Learning and the Big Data Challenge Max Welling UC Irvine 1

AI’s Promise 60 years ago Robots that behave and think like humans Marvin Minsky: Computer Vision will be easy, chess will be hard 2

What we got Deep Blue beat Kasparov in the game of chess […] Watson won Jeopardy 3

What is hard for AI? Computer Vision and Scene Understanding […] 4

But we are making Progress 5

Another Example: Machine Translation Tremendous progress has been made Main reason: more and better data (e.g. documents from EU) 6

Language Processing Google’s spelling and query correction Main reason for progress: Google’s massive datasets 7

Ingredients of Progress Better AI Systems Better Models More Data Faster Computation 8

Computation: Moore’s Law Computational power is doubling every two years (approximately). 9

Trends in Computing: Cloud Computing Computing will become similar to electricity: take it as you need it. We need global wifi coverage to make this work well. 10

Trends in Computing: Distributed Computing (e.g. GPUs) Cheap and massively parallel computing (up to 300 processing units) First developed for the gaming community Now adopted by machine learning for very fast learning (3d ReNNaicance) 11

Big Data That’s 38 images every second if you live for 100 years (that’s more visual data than anyone will actually see …) Current data-volume ~ 2 Zettabyte (2 trillion GB), and doubling every 1.5 years. Data volume has it’s own Moore’s law! 12

13 Big Data: The McKinsey 2011 Report

Sensors Everywhere Internet. There are around 1.85 million surveillance cameras in the UK alone. That is 1 camera for every 32 people ! On a typical day every person will be recorded on around 70 CCTV cameras There are about 5.6 billion cellphone users worldwide 14

Machine Learning Algorithms that learn to make predictions from examples (data) 15

Generalization Consider the following regression problem: Predict the real value on the y-axis from the real value on the x-axis. You are given 6 examples: {Xi,Yi}. What is the y-value for a new query point X* ? X* 16

Generalization 17

Generalization 18

Generalization which curve is best? 19

Ockham’s razor: prefer the simplest hypothesis consistent with data. Generalization 20

Generalization Learning is concerned with accurate prediction of future data, not accurate prediction of training data. 21

Learning as Compression Imagine a game where Bob needs to send a dataset to Alice. They are allowed to meet once before they see the data. The agree on a precision level (quantization level). Bob learns a model (red line). Bob sends the model parameters (offset and slant) only once For every datapoint, Bob sends -distance along line (large number) -orthogonal distance from line (small number) (small numbers are cheaper to encode than large numbers) 22

Generalization learning = compression = abstraction The man who couldn’t forget … 23

Types of Learning Supervised Learning Labels are provided, there is a strong learning signal. e.g. classification, regression. Semi-supervised Learning. Only part of the data have labels. e.g. a child growing up. Reinforcement learning. The learning signal is a (scalar) reward and may come with a delay. e.g. trying to learn to play chess, a mouse in a maze. Unsupervised learning There is no direct learning signal. We are simply trying to find structure in data. e.g. clustering, dimensionality reduction. 24

Classification: nearest neighbor Example: Imagine you want to classify versus Data: 100 monkey images and 200 human images with labels what is what. Task: Here is a new image: monkey or human? 25

1 nearest neighbor Idea: 1.Find the picture in the database which is closest your query image. 2.Check its label. 3.Declare the class of your query image to be the same as that of the closest picture. query closest image 26

kNN Decision Surface decision curve 27

Unsupervised Learning: Dimensionality Reduction 28 (LLE – Roweis & Saul)

Collaborative Filtering movies (+/- 17,770) users (+/- 240,000) total of +/- 400,000,000 nonzero entries (99% sparse) 4 (Netflix Dataset) 4 ? 1 1? 29

Bayes Rule(s) Riddle: Joe goes to the doctor and tells the doctor he has a stiff neck and a rash. The doctor is worried about meningitis and performs a test that is 80% correct, that is, for 80% of the people that have meningitis it will turn out positive. If 1 in 100,000 people have meningitis in the population and 1 in 1000 people will test positive (sick or not sick) what is the probability that Joe has meningitis? Answer: Bayes Rule. P(meningitis | positive test) = P(positive test | meningitis ) P(meningitis) / P(positive test) = 0.8 * / = < 1% 30

Bayesian Networks & Graphical Models Main modeling tool for modern machine learning Reasoning over large collections of random variables with intricate relations test result meningitis stiff-neck, rash 31

Nonparametric Bayes Assumption: Real world data is infinitely complex. Consequence: as the dataset grows, so should the model complexity. Nonparametric Bayesian models do exactly that. Hierarchical clustering Of birds 32

Trends in ML: Human Computation Old paradigm: Computers assist humans New paradigm: Humans assist Computers to learn (The raising of the machines) (Luis Von Ahn) 33

HC I: Useful Games EPS game to label images “LabelMe” to segment & label images 34

HC II: Crowd-sourced Marketplaces Split a problem into many small and simple problems and sell them on a crowd-sourced marketplace such as Amazon’s “Mechanical Turk. 35

MT Example 36

HC III: Online Competitions Netflix organized an online competition to improve their movie recommender system Prize money: 1 million dollars if 10% improvement was achieved. It lasted 3 years, at least 20,000 teams registered from 150 countries. Kaggle has turned this into a business and hosts numerous competitions. Latest: Heritage Healthcare Competition at $3M! 37

What won the Netflix Prize? Ensemble learning: learn many models (e.g. 200) and average their predictions. Algorithmic equivalent of “Wisdom of the Crowds” 38

Wisdom of the Crowds Estimate the weight of the Space Shuttle (in tons) Take mean or median of answers. Does surprisingly well. Time for experiment? Mechanism: canceling of independent errors Answer: 39

Prediction Markets “Idea Futures” Use magnitude of the bet to express confidence. 40

AI Assisted Learning Stanford is offering 15 courses online to +/- 100,000 students. Involved homework and exams and a “certificate of achievement”. Flipping the classroom: watch lecture video at home, do homework in class. AI can find right set of exercises/hints or cyber-partner for each individual student and track progress 41

Outlook Volume/diversity of data and computing power is growing exponentially. Proliferation of sensors, internet and “human computing” allow for AI systems that are very different from human intelligence. Future AI’s will: sense your location, intention, mood, needs. Anticipate your next action (order nonfat cappuccino from Starbuck at 9am). Monitor your health. Monitor environment. (now: Google glasses)(past vision)(a virtual, connected world) 42