1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Machine Learning: Intro and Supervised Classification
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Machine learning continued Image source:
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning from Observations Chapter 18 Section 1 – 4.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Part I: Classification and Bayesian Learning
Introduction to Machine Learning Approach Lecture 5.
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
COMP3503 Intro to Inductive Modeling
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
CpSc 810: Machine Learning Design a learning system.
1 CS 512 Machine Learning Berrin Yanikoglu Slides are expanded from the Machine Learning-Mitchell book slides Some of the extra slides thanks to T. Jaakkola,
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
Machine Learning An Introduction. What is Learning?  Herbert Simon: “Learning is any process by which a system improves performance from experience.”
Machine Learning CSE 681 CH2 - Supervised Learning.
For Friday Finish Chapter 18 Homework: –Chapter 18, exercises 1-2.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
Universit at Dortmund, LS VIII
1 Mining in geographic data Original slides:Raymond J. Mooney University of Texas at Austin.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Machine Learning.
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
Learning from observations
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Machine Learning Introduction. Class Info Office Hours –Monday:11:30 – 1:00 –Wednesday:10:00 – 1:00 –Thursday:11:30 – 1:00 Course Text –Tom Mitchell:
Data Mining and Decision Support
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 18 Section 1 – 3 Learning from Observations.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Big data classification using neural network
Information Organization: Overview
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
Intro to Machine Learning
Perceptrons Lirong Xia.
Data Mining Lecture 11.
CS Fall 2016 (Shavlik©), Lecture 2
Overview of Machine Learning
Classification and Prediction
Why Machine Learning Flood of data
Machine learning overview
Learning from Observations
Information Organization: Overview
Machine Learning overview Chapter 18, 21
Perceptrons Lirong Xia.
Presentation transcript:

1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning is any process by which a system improves performance from experience.” –Herbert Simon “Learning is constructing or modifying representations of what is being experienced.” –Ryszard Michalski “Learning is making useful changes in our minds.” – Marvin Minsky

2 Machine Learning - Example One of my favorite AI/Machine Learning sites: 

3 Why learn? Build software agents that can adapt to their users or to other software agents or to changing environments  Personalized news or mail filter  Personalized tutoring  Mars robot Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task  Large, complex AI systems cannot be completely derived by hand and require dynamic updating to incorporate new information. Discover new things or structure that were previously unknown to humans  Examples: data mining, scientific discovery

4

5 Applications Assign object/event to one of a given finite set of categories.  Medical diagnosis  Credit card applications or transactions  Fraud detection in e-commerce  Spam filtering in  Recommended books, movies, music  Financial investments  Spoken words  Handwritten letters

6 Major paradigms of machine learning Rote learning – “Learning by memorization.”  Employed by first machine learning systems, in 1950s Samuel’s Checkers program Supervised learning – Use specific examples to reach general conclusions or extract general rules Classification (Concept learning) Regression Unsupervised learning (Clustering) – Unsupervised identification of natural groups in data Reinforcement learning– Feedback (positive or negative reward) given at the end of a sequence of steps Analogy – Determine correspondence between two different representations Discovery – Unsupervised, specific goal not given

7 Rote Learning is Limited Memorize I/O pairs and perform exact matching with new inputs If a computer has not seen the precise case before, it cannot apply its experience We want computers to “generalize” from prior experience  Generalization is the most important factor in learning

8 The inductive learning problem Extrapolate from a given set of examples to make accurate predictions about future examples Supervised versus unsupervised learning  Learn an unknown function f(X) = Y, where X is an input example and Y is the desired output.  Supervised learning implies we are given a training set of (X, Y) pairs by a “teacher”  Unsupervised learning means we are only given the Xs.  Semi-supervised learning: mostly unlabelled data

9 Types of supervised learning a)Classification: We are given the label of the training objects: {(x1,x2,y=T/O)} We are interested in classifying future objects: (x1’,x2’) with the correct label. I.e. Find y’ for given (x1’,x2’). b)Concept Learning: We are given positive and negative samples for the concept we want to learn (e.g.Tangerine): {(x1,x2,y=+/-)} We are interested in classifying future objects as member of the class (or positive example for the concept) or not. I.e. Answer +/- for given (x1’,x2’). x1=size x2=color Tangerines Oranges Tangerines Not Tangerines

10 Types of Supervised Learning Regression  Target function is continuous rather than class membership  y=f(x)

11 Example Positive ExamplesNegative Examples How does this symbol classify? Concept Solid Red Circle in a (regular?) Polygon What about? Figures on left side of page Figures drawn before 5pm 2/2/89

12 Inductive learning framework: Feature Space Raw input data from sensors are typically preprocessed to obtain a feature vector, X, that adequately describes all of the relevant features for classifying examples Each x is a list of (attribute, value) pairs x = [Color=Orange Shape=Round Weight=200g ] Each attribute can be discrete or continuous Each example can be interpreted as a point in an n-dimensional feature space, where n is the number of attributes Model space M defines the possible hypotheses  M: X → C, M = {m 1, … m n } (possibly infinite) Training data can be used to direct the search for a good (consistent, complete, simple) hypothesis in the model space

13 Feature Space Size Color Weight ? Big 2500 Gray A “concept” is then a (possibly disjoint) volume in this space.

14 Learning: Key Steps data and assumptions – what data is available for the learning task? – what can we assume about the problem? representation – how should we represent the examples to be classified method and estimation – what are the possible hypotheses? – what learning algorithm to use to infer the most likely hypothesis? – how do we adjust our predictions based on the feedback? evaluation – how well are we doing? …

15

16 Evaluation of Learning Systems Experimental  Conduct controlled cross-validation experiments to compare various methods on a variety of benchmark datasets.  Gather data on their performance, e.g. test accuracy, training- time, testing-time.  Analyze differences for statistical significance. Theoretical  Analyze algorithms mathematically and prove theorems about their: Computational complexity Ability to fit training data Sample complexity (number of training examples needed to learn an accurate function)

17 Measuring Performance Performance of the learner can be measured in one of the following ways, as suitable for the application:  Accuracy performance Number of mistakes Mean Square Error  Solution quality (length, efficiency)  Speed of performance  …

18

19 Curse of Dimensionality

20 Curse of Dimensionality Imagine a learning task, such as recognizing printed characters. Intuitively, adding more attributes would help the learner, as more information never hurts, right? In fact, sometimes it does, due to what is called curse of dimensionality  it can be summarized as the situation where the available data may not be sufficient to compensate for the increased number of parameters that comes with increased dimensionality

21 Curse of Dimensionality

22 Polynomial Curve Fitting

23 Sum-of-Squares Error Function

24 0 th Order Polynomial

25 1 st Order Polynomial

26 3 rd Order Polynomial

27 9 th Order Polynomial

28 Over-fitting Root-Mean-Square (RMS) Error:

29 Data Set Size: 9 th Order Polynomial

30 Data Set Size: 9 th Order Polynomial

31 Lessons Learned about Learning Learning can be viewed as using direct or indirect experience to approximate a chosen target function. Function approximation can be viewed as a search through a space of hypotheses (representations of functions) for one that best fits a set of training data. Different learning methods assume different hypothesis spaces (representation languages) and/or employ different search techniques.

32 Issues in Machine Learning Training Experience  What can be the training experience (labelled samples, self-play,…) Target Function  What should we aim to learn?  What should the representation of the target function be (features, hypothesis class,…) Learning:  What learning algorithms exist for learning general target functions from specific training examples?  Which algorithms can approximate functions well (and when)?  How does noisy data influence accuracy?  How does number of training samples influence accuracy? Training Data:  How much training data is sufficient?  How does number of training examples influence accuracy?  What is the best strategy for choosing a useful next training experience? How it affects complexity? Prior Knowledge/Domain Knowledge:  When and how can prior knowledge guide the learning process? Evaluation:  What specific target functions/performance measure should the system attempt to learn? …