INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Intro to Machine Learning.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 1 Overview of Machine Learning (Based on Chapter 1 of Mitchell T.., Machine Learning, 1997)
Advertisements

8. Machine Learning, Support Vector Machines
Computational Methods for Data Analysis
Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
CS Machine Learning.
1 Machine Learning Introduction Paola Velardi. 2 Course material Slides (partly) from: 91L/
1 Lecture 35 Brief Introduction to Main AI Areas (cont’d) Overview  Lecture Objective: Present the General Ideas on the AI Branches Below  Introduction.
Amit Sethi, EEE, IIT Cepstrum, Oct 16,
TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.
CII504 Intelligent Engine © 2005 Irfan Subakti Department of Informatics Institute Technology of Sepuluh Nopember Surabaya - Indonesia.
Introduction to Introduction to Artificial Intelligence Henry Kautz.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
EECS 349 Machine Learning Instructor: Doug Downey Note: slides adapted from Pedro Domingos, University of Washington, CSE
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
A Brief Survey of Machine Learning
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Introduction to machine learning
CS 391L: Machine Learning Introduction
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Artificial Intelligence (AI) Addition to the lecture 11.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
IST 511 Information Management: Information and Technology
1 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.” –Herbert Simon “Learning.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 1. Introduction.
CpSc 810: Machine Learning Design a learning system.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
1 CS 512 Machine Learning Berrin Yanikoglu Slides are expanded from the Machine Learning-Mitchell book slides Some of the extra slides thanks to T. Jaakkola,
1 Machine Learning Introduction Paola Velardi. 2 Course material Slides (partly) from: 91L/
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Mining in geographic data Original slides:Raymond J. Mooney University of Texas at Austin.
Lecture 10: 8/6/1435 Machine Learning Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Machine Learning.
Introduction to Artificial Intelligence and Soft Computing
Introduction Introduction Dr. Khaled Wassif Spring Machine Learning.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
Learning from observations
Week 1 - An Introduction to Machine Learning & Soft Computing
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 9 of 42 Wednesday, 14.
Machine Learning Introduction. Class Info Office Hours –Monday:11:30 – 1:00 –Wednesday:10:00 – 1:00 –Thursday:11:30 – 1:00 Course Text –Tom Mitchell:
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
Data Mining and Decision Support
1 Introduction to Machine Learning Chapter 1. cont.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
Brief Intro to Machine Learning CS539
Spring 2003 Dr. Susan Bridges
Data Mining Lecture 11.
Basic Intro Tutorial on Machine Learning and Data Mining
The Pennsylvania State University
Overview of Machine Learning
3.1.1 Introduction to Machine Learning
Why Machine Learning Flood of data
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Presentation transcript:

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Intro to Machine Learning

MOTIVATION Cognition involves making decisions on the basis of knowledge Example: wordsense disambiguation

3 SENSES OF “line” Product: “While he wouldn’t estimate the sale price, analysts have estimated that it would exceed $1 billion. Kraft also told analysts it plans to develop and test a line of refrigerated entrees and desserts, under the Chillery brand name.” Formation: “C-LD-R L-V-S V-NNA reads a sign in Caldor’s book department. The 1,000 or so people fighting for a place in line have no trouble filling in the blanks.” Text: “Newspaper editor Francis P. Church became famous for a 1897 editorial, addressed to a child, that included the line “Yes, Virginia, there is a Santa Clause.” Cord: “It is known as an aggressive, tenacious litigator. Richard D. Parsons, a partner at Patterson, Belknap, Webb and Tyler, likes the experience of opposing Sullivan & Cromwell to “having a thousand-pound tuna on the line.” Division: “Today, it is more vital than ever. In 1983, the act was entrenched in a new constitution, which established a tricameral parliament along racial lines, with separate chambers for whites, coloreds and Asians but none for blacks.” Phone: “On the tape recording of Mrs. Guba's call to the 911 emergency line, played at the trial, the baby sitter is heard begging for an ambulance.”

WHY LEARNING It is often hard to articulate the knowledge required for such cognitive tasks – The senses of a word – How to decide We humans are not aware of it Better from a cognitive and a system building perspective): develop theories of how this information is acquired

HUME ON INDUCTION Piece of bread number 1 was nourishing when I ate it. Piece of bread number 2 was nourishing when I ate it. Piece of bread number 3 was nourishing when I ate it. ….. Piece of bread number 100 was nourishing when I ate it. Therefore, all pieces of bread will be nourishing if I eat them. -- David Hume

WHEN IS INDUCTION SAFE? If asked why we believe the sun will rise tomorrow, we shall naturally answer, 'Because it has always risen every day.’ We have a firm belief that it will rise in the future, because it has risen in the past. The real question is: Do any number of cases of a law being fulfilled in the past afford evidence that it will be fulfilled in the future? It has been argued that we have reason to know the future will resemble the past, because what was the future has constantly become the past, and has always been found to resemble the past, so that we really have experience of the future, namely of times which were formerly future, which we may call past futures. But such an argument really begs the very question at issue. We have experience of past futures, but not of future futures, and the question is: Will future futures resemble past futures? -- Bertrand Russell, On Induction

WHAT IS LEARNING Memorizing something Learning facts through observation and exploration Developing motor and/or cognitive skills through practice Organizing new knowledge into general, effective representations

A DEFINITION OF LEARNING: LEARNING AS IMPROVEMENT Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time. -- Herb Simon

9 LEARNING AS IMPROVEMENT, 2 Improve on task, T, with respect to performance metric, P, based on experience, E. T: Assign to words their senses. P: Percentage of words correctly classified. E: Corpus of words, some with human-given labels T: Categorize messages as spam or legitimate. P: Percentage of messages correctly classified. E: Database of s, some with human-given labels T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words

10 SPECIFYING A LEARNING SYSTEM Choose the training experience Choose exactly what is too be learned, i.e. the target function. Choose how to represent the target function. Choose a learning algorithm to infer the target function from the experience. Environment/ Experience Learner Knowledge Performance Element

LEARNING A FUNCTION Given a set of input / output pairs, find a function that does a good job of expressing the relationship: – Wordsense disambiguation as a function from words (the input) to their senses (the outputs) – Categorizing messages as a function from s to their category (spam, useful) – A checker playing strategy a function from moves to their values (winning, losing)

WAYS OF LEARNING A FUNCTION SUPERVISED: given a set of example input / output pairs, find a rule that does a good job of predicting the output associated with an input UNSUPERVISED learning or CLUSTERING: given a set of examples, but no labelling, group the examples into “natural” clusters REINFORCEMENT LEARNING: an agent interacting with the world makes observations, takes action, and is rewarded or punished; the agent learns to take action in order to maximize reward

SUPERVISED LEARNING OF WORD SENSE DISAMBIGUATION Training experience: a CORPUS of examples annotated with their correct wordsense (according, e.g., to WordNet)

UNSUPERVISED LEARNING OF WORDSENSE DISAMBIGUATION Identify distinctions between contexts in which a word like LINE is encountered and hypothesize sense distinctions on their basis – Cord: fish, fishermen, tuna – Product: companies, products, finance (The way children learn a lot of what they know)

PROBLEMS IN LEARNING A FUNCTION Memory Averaging Generalization

EXAMPLE OF LEARNING PROBLEM Playing checkers

17 TRAINING EXPERIENCE Supervised learning: Given sample input and output pairs for a useful target function. – Checker boards labeled with the correct move, e.g. extracted from record of expert play

18 CHOOSING A TARGET FUNCTION What function is to be learned and how will it be used by the performance system? For checkers, assume we are given a function for generating the legal moves for a given board position and want to decide the best move. – Could learn a function: ChooseMove(board, legal-moves) → best-move – Or could learn an evaluation function, V(board) → R, that gives each board position a score for how favorable it is. V can be used to pick a move by applying each legal move, scoring the resulting board position, and choosing the move that results in the highest scoring board position.

19 Ideal Definition of V(b) If b is a final winning board, then V(b) = 100 If b is a final losing board, then V(b) = –100 If b is a final draw board, then V(b) = 0 Otherwise, then V(b) = V(b´), where b´ is the highest scoring final board position that is achieved starting from b and playing optimally until the end of the game (assuming the opponent plays optimally as well). – Can be computed using complete mini-max search of the finite game tree.

20 Linear Function for Representing V(b) In checkers, use a linear approximation of the evaluation function. – bp(b): number of black pieces on board b – rp(b): number of red pieces on board b – bk(b): number of black kings on board b – rk(b): number of red kings on board b – bt(b): number of black pieces threatened (i.e. which can be immediately taken by red on its next turn) – rt(b): number of red pieces threatened

21 Obtaining Training Values Direct supervision may be available for the target function. –, 100> (win for black)

22 Learning Algorithm Uses training values for the target function to induce a hypothesized definition that fits these examples and hopefully generalizes to unseen examples. In statistics, learning to approximate a continuous function is called regression. Attempts to minimize some measure of error (loss function) such as mean squared error:

A SPATIAL WAY OF LOOKING AT LEARNING Learning a function can also be viewed as learning how to discriminate between different types of objects in a space

A SPATIAL VIEW OF LEARNING SPAM NON-SPAM

A SPATIAL VIEW OF LEARNING The task of the learner is to learn a function that divides the space of examples into black and red

A SPATIAL VIEW OF LEARNING

A MORE DIFFICULT EXAMPLE

ONE SOLUTION

ANOTHER SOLUTION

TRAINING AND TESTING Induction gives us the ability to classify unseen experiences In order to assess the ability of a learning algorithm to give us this, we need to evaluate its performance on different data from those used for training it – TEST DATA

CLASSIFICATION Mooney

VARIETIES OF LEARNING Different functions Different ways of searching the function

33 Various Function Representations Numerical functions – Linear regression – Neural networks – Support vector machines Symbolic functions – Decision trees – Rules in propositional logic – Rules in first-order predicate logic Instance-based functions – Nearest-neighbor – Case-based Probabilistic Graphical Models – Naïve Bayes – Bayesian networks – Hidden-Markov Models (HMMs) – Probabilistic Context Free Grammars (PCFGs) – Markov networks

34 Various Search Algorithms Gradient descent – Perceptron – Backpropagation Dynamic Programming – HMM Learning – PCFG Learning Divide and Conquer – Decision tree induction – Rule learning Evolutionary Computation – Genetic Algorithms (GAs) – Genetic Programming (GP) – Neuro-evolution

35 History of Machine Learning 1950s – Samuel’s checker player – Selfridge’s Pandemonium 1960s: – Neural networks: Perceptron – Pattern recognition – Learning in the limit theory – Minsky and Papert prove limitations of Perceptron 1970s: – Symbolic concept induction – Winston’s arch learner – Expert systems and the knowledge acquisition bottleneck – Quinlan’s ID3 – Michalski’s AQ and soybean diagnosis – Scientific discovery with BACON – Mathematical discovery with AM

36 History of Machine Learning (cont.) 1980s: – Advanced decision tree and rule learning – Explanation-based Learning (EBL) – Learning and planning and problem solving – Utility problem – Analogy – Cognitive architectures – Resurgence of neural networks (connectionism, backpropagation) – Valiant’s PAC Learning Theory – Focus on experimental methodology 1990s – Data mining – Adaptive software agents and web applications – Text learning – Reinforcement learning (RL) – Inductive Logic Programming (ILP) – Ensembles: Bagging, Boosting, and Stacking – Bayes Net learning

37 History of Machine Learning (cont.) 2000s – Support vector machines – Kernel methods – Graphical models – Statistical relational learning – Transfer learning – Sequence labeling – Collective classification and structured outputs – Computer Systems Applications Compilers Debugging Graphics Security (intrusion, virus, and worm detection) – management – Personalized assistants that learn – Learning in robotics and vision

READINGS English: – T. Mitchell, Machine Learning, Mc-Graw Hill, ch.1 Italian: – R. Basili & A. Moschitti, Apprendimento automatico, in F. Bianchini et al, Instrumentum vocale

THANKS I used material from – MIT Open Course ware – Ray Mooney’s ML course at UTexas