Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Alternative Techniques
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Learning From Data Chichang Jou Tamkang University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Learning: Introduction and Overview
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Data Mining – Intro.
Part I: Classification and Bayesian Learning
Introduction to Machine Learning Approach Lecture 5.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 CS 512 Machine Learning Berrin Yanikoglu Slides are expanded from the Machine Learning-Mitchell book slides Some of the extra slides thanks to T. Jaakkola,
Chapter 9 Neural Network.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Learning from Observations Chapter 18 Through
Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Classification Techniques: Bayesian Classification
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Learning from observations
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Data Mining and Decision Support
Copyright Paula Matuszek Kinds of Machine Learning.
Artificial Intelligence Machine Learning. Learning Learning can be described as normally a relatively permanent change that occurs in behaviour as a result.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Brief Intro to Machine Learning CS539
Who am I? Work in Probabilistic Machine Learning Like to teach 
Machine Learning for Computer Security
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Machine Learning: Introduction
School of Computer Science & Engineering
Data Mining Lecture 11.
AV Autonomous Vehicles.
Basic Intro Tutorial on Machine Learning and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
CS Fall 2016 (Shavlik©), Lecture 2
Overview of Machine Learning
3.1.1 Introduction to Machine Learning
Machine Learning: Introduction
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Information Retrieval
Machine Learning overview Chapter 18, 21
Presentation transcript:

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer

What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time” – Herbert SimonHerbert Simon “Learning is constructing or modifying representations of what is being experienced” – Ryszard MichalskiRyszard Michalski “Learning is making useful changes in our minds” – Marvin MinskyMarvin Minsky

Why study learning? Understand and improve efficiency of human learning –Use to improve methods for teaching and tutoring people (e.g., better computer-aided instruction) Discover new things or structure previously unknown –Examples: data mining, scientific discovery Fill in skeletal or incomplete specifications in a domain –Large, complex systems can’t be completely built by hand & require dynamic updating to incorporate new information –Learning new characteristics expands the domain or expertise and lessens the “brittleness” of the system Build agents that can adapt to users, other agents, and their environment

AI & Learning Today Neural network learning was popular in the 60s In the 70s and 80s it was replaced with a paradigm based on manually encoding and using knowledge In the 90s, more data and the Web drove interest in new statistical machine learning (ML) techniques and new data mining applications Today, ML techniques and big data are behind almost all successful intelligent systems

Machine Leaning Successes Sentiment analysis Spam detection Machine translation Spoken language understanding Named entity detection Self driving cars Motion recognition (Microsoft X-Box) Identifying paces in digital images Recommender systems (Netflix, Amazon) Credit card fraud detection

A general model of learning agents

Major paradigms of machine learning Rote learning: 1-1 mapping from inputs to stored representation, learning by memorization, association-based storage and retrieval Induction: Use specific examples to reach general conclusions Clustering: Unsupervised identification of natural groups in data Analogy: Determine correspondence between two different representations Discovery: Unsupervised, specific goal not given Genetic algorithms: Evolutionary search techniques, based on an analogy to survival of the fittest Reinforcement – Feedback (positive or negative reward) given at the end of a sequence of steps

8 The Classification Problem Extrapolate from set of examples to make accurate predictions about future ones Supervised versus unsupervised learning –Learn unknown function f(X)=Y, where X is an input example and Y is desired output –Supervised learning implies we’re given a training set of (X, Y) pairs by a “teacher” –Unsupervised learning means we are only given the Xs and some (ultimate) feedback function on our performance. Concept learning or classification (aka “induction”) –Given a set of examples of some concept/class/category, determine if a given example is an instance of the concept or not –If it is an instance, we call it a positive example –If it is not, it is called a negative example –Or we can make a probabilistic prediction (e.g., using a Bayes net)

9 Supervised Concept Learning Given a training set of positive and negative examples of a concept Construct a description that will accurately classify whether future examples are positive or negative That is, learn some good estimate of function f given a training set {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where each y i is either + (positive) or - (negative), or a probability distribution over +/-

10 Inductive Learning Framework Raw input data from sensors are typically preprocessed to obtain a feature vector, X, that adequately describes all of the relevant features for classifying examples Each x is a list of (attribute, value) pairs. For example, X = [Person:Sue, EyeColor:Brown, Age:Young, Sex:Female] The number of attributes (a.k.a. features) is fixed (positive, finite) Each attribute has a fixed, finite number of possible values (or could be continuous) Each example can be interpreted as a point in an n-dimensional feature space, where n is the number of attributes

Measuring Model Quality How good is a model? –Predictive accuracy –False positives / false negatives for a given cutoff threshold Loss function (accounts for cost of different types of errors) –Area under the (ROC) curve –Minimizing loss can lead to problems with overfitting Training error –Train on all data; measure error on all data –Subject to overfitting (of course we’ll make good predictions on the data on which we trained!) Regularization –Attempt to avoid overfitting –Explicitly minimize the complexity of the function while minimizing loss. Tradeoff is modeled with a regularization parameter 11

Cross-Validation –Divide data into training set and test set –Train on training set; measure error on test set –Better than training error, since we are measuring generalization to new data –To get a good estimate, we need a reasonably large test set –But this gives less data to train on, reducing our model quality! 12

Cross-Validation, cont. k-fold cross-validation: –Divide data into k folds –Train on k-1 folds, use kth fold to measure error –Repeat k times; use average error to measure generalization accuracy –Statistically valid; gives good accuracy estimates Leave-one-out cross-validation (LOOCV) –k-fold where k=N (test data = 1 instance!) –Accurate but expensive; requires building N models 13

Inductive learning as search Instance space I defines the language for the training and test instances –Typically, but not always, each instance i  I is a feature vector –Features are sometimes called attributes or variables –I: V 1 x V 2 x … x V k, i = (v 1, v 2, …, v k ) Class variable C gives an instance’s class (to be predicted) Model space M defines the possible classifiers –M: I → C, M = {m1, … mn} (possibly infinite) –Model space is sometimes, but not always, defined in terms of the same features as the instance space Training data can be used to direct the search for a good (consistent, complete, simple) hypothesis in the model space

Model spaces Decision trees –Partition the instance space into axis-parallel regions, labeled with class value Version spaces –Search for necessary (lower-bound) and sufficient (upper-bound) partial instance descriptions for an instance to be in the class Nearest-neighbor classifiers –Partition the instance space into regions defined by the centroid instances (or cluster of k instances) Associative rules (feature values → class) First-order logical rules Bayesian networks (probabilistic dependencies of class on attributes) Neural networks

Model spaces I I I Nearest neighbor Version space Decision tree