Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday 15 October 2002 William.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
CII504 Intelligent Engine © 2005 Irfan Subakti Department of Informatics Institute Technology of Sepuluh Nopember Surabaya - Indonesia.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Learning From Data Chichang Jou Tamkang University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.
A Brief Survey of Machine Learning
Part I: Classification and Bayesian Learning
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. Concept Learning Reference : Ch2 in Mitchell’s book 1. Concepts: Inductive learning hypothesis General-to-specific.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 7, 2001.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 27, 2000 William.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 08 February 2007.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 29 October 2004 William.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, March 10, 2000 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Thursday, December 06, 2001.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Chapter 6 Bayesian Learning
Computing & Information Sciences Kansas State University Wednesday, 15 Nov 2006CIS 490 / 730: Artificial Intelligence Lecture 35 of 42 Wednesday, 15 November.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Computing & Information Sciences Kansas State University Friday, 27 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 27 of 42 Friday, 27 October.
Data Mining and Decision Support
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, 08 March 2007 William.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, September 28, 1999.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 28 February 2007.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 01 February 2016 William.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Brief Intro to Machine Learning CS539
Analytical Learning Discussion (4 of 4):
Data Mining Lecture 11.
Basic Intro Tutorial on Machine Learning and Data Mining
Machine Learning: Lecture 3
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
A task of induction to find patterns
A task of induction to find patterns
Chapter 14 February 26, 2004.
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Tuesday 15 October 2002 William H. Hsu Department of Computing and Information Sciences, KSU Readings: Chapters 1-7, Mitchell Chapters 14-15, 18, Russell and Norvig Midterm Review Lecture 14

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 0: A Brief Overview of Machine Learning Overview: Topics, Applications, Motivation Learning = Improving with Experience at Some Task –Improve over task T, –with respect to performance measure P, –based on experience E. Brief Tour of Machine Learning –A case study –A taxonomy of learning –Intelligent systems engineering: specification of learning problems Issues in Machine Learning –Design choices –The performance element: intelligent systems Some Applications of Learning –Database mining, reasoning (inference/decision support), acting –Industrial usage of intelligent systems

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 1: Concept Learning and Version Spaces Concept Learning as Search through H –Hypothesis space H as a state space –Learning: finding the correct hypothesis General-to-Specific Ordering over H –Partially-ordered set: Less-Specific-Than (More-General-Than) relation –Upper and lower bounds in H Version Space Candidate Elimination Algorithm –S and G boundaries characterize learner’s uncertainty –Version space can be used to make predictions over unseen cases Learner Can Generate Useful Queries Next Lecture: When and Why Are Inductive Leaps Possible?

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 2: Inductive Bias and PAC Learning Inductive Leaps Possible Only if Learner Is Biased –Futility of learning without bias –Strength of inductive bias: proportional to restrictions on hypotheses Modeling Inductive Learners with Equivalent Deductive Systems –Representing inductive learning as theorem proving –Equivalent learning and inference problems Syntactic Restrictions –Example: m-of-n concept Views of Learning and Strategies –Removing uncertainty (“data compression”) –Role of knowledge Introduction to Computational Learning Theory (COLT) –Things COLT attempts to measure –Probably-Approximately-Correct (PAC) learning framework Next: Occam’s Razor, VC Dimension, and Error Bounds

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 3: PAC, VC-Dimension, and Mistake Bounds COLT: Framework Analyzing Learning Environments –Sample complexity of C (what is m?) –Computational complexity of L –Required expressive power of H –Error and confidence bounds (PAC: 0 <  < 1/2, 0 <  < 1/2) What PAC Prescribes –Whether to try to learn C with a known H –Whether to try to reformulate H (apply change of representation) Vapnik-Chervonenkis (VC) Dimension –A formal measure of the complexity of H (besides | H |) –Based on X and a worst-case labeling game Mistake Bounds –How many could L incur? –Another way to measure the cost of learning Next: Decision Trees

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 4: Decision Trees Decision Trees (DTs) –Can be boolean (c(x)  {+, -}) or range over multiple classes –When to use DT-based models Generic Algorithm Build-DT: Top Down Induction –Calculating best attribute upon which to split –Recursive partitioning Entropy and Information Gain –Goal: to measure uncertainty removed by splitting on a candidate attribute A Calculating information gain (change in entropy) Using information gain in construction of tree –ID3  Build-DT using Gain() ID3 as Hypothesis Space Search (in State Space of Decision Trees) Heuristic Search and Inductive Bias Data Mining using MLC++ (Machine Learning Library in C++) Next: More Biases (Occam’s Razor); Managing DT Induction

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 5: DTs, Occam’s Razor, and Overfitting Occam’s Razor and Decision Trees –Preference biases versus language biases –Two issues regarding Occam algorithms Why prefer smaller trees?(less chance of “coincidence”) Is Occam’s Razor well defined?(yes, under certain assumptions) –MDL principle and Occam’s Razor: more to come Overfitting –Problem: fitting training data too closely General definition of overfitting Why it happens –Overfitting prevention, avoidance, and recovery techniques Other Ways to Make Decision Tree Induction More Robust Next: Perceptrons, Neural Nets (Multi-Layer Perceptrons), Winnow

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 6: Perceptrons and Winnow Neural Networks: Parallel, Distributed Processing Systems –Biological and artificial (ANN) types –Perceptron (LTU, LTG): model neuron Single-Layer Networks –Variety of update rules Multiplicative (Hebbian, Winnow), additive (gradient: Perceptron, Delta Rule) Batch versus incremental mode –Various convergence and efficiency conditions –Other ways to learn linear functions Linear programming (general-purpose) Probabilistic classifiers (some assumptions) Advantages and Disadvantages –“Disadvantage” (tradeoff): simple and restrictive –“Advantage”: perform well on many realistic problems (e.g., some text learning) Next: Multi-Layer Perceptrons, Backpropagation, ANN Applications

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 7: MLPs and Backpropagation Multi-Layer ANNs –Focused on feedforward MLPs –Backpropagation of error: distributes penalty (loss) function throughout network –Gradient learning: takes derivative of error surface with respect to weights Error is based on difference between desired output (t) and actual output (o) Actual output (o) is based on activation function Must take partial derivative of   choose one that is easy to differentiate Two  definitions: sigmoid (aka logistic) and hyperbolic tangent (tanh) Overfitting in ANNs –Prevention: attribute subset selection –Avoidance: cross-validation, weight decay ANN Applications: Face Recognition, Text-to-Speech Open Problems Recurrent ANNs: Can Express Temporal Depth (Non-Markovity) Next: Statistical Foundations and Evaluation, Bayesian Learning Intro

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 8: Statistical Evaluation of Hypotheses Statistical Evaluation Methods for Learning: Three Questions –Generalization quality How well does observed accuracy estimate generalization accuracy? Estimation bias and variance Confidence intervals –Comparing generalization quality How certain are we that h 1 is better than h 2 ? Confidence intervals for paired tests –Learning and statistical evaluation What is the best way to make the most of limited data? k-fold CV Tradeoffs: Bias versus Variance Next: Sections , Mitchell (Bayes’s Theorem; ML; MAP)

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 9: Bayes’s Theorem, MAP, MLE Introduction to Bayesian Learning –Framework: using probabilistic criteria to search H –Probability foundations Definitions: subjectivist, objectivist; Bayesian, frequentist, logicist Kolmogorov axioms Bayes’s Theorem –Definition of conditional (posterior) probability –Product rule Maximum A Posteriori (MAP) and Maximum Likelihood (ML) Hypotheses –Bayes’s Rule and MAP –Uniform priors: allow use of MLE to generate MAP hypotheses –Relation to version spaces, candidate elimination Next: , Mitchell; Chapter 14-15, Russell and Norvig; Roth –More Bayesian learning: MDL, BOC, Gibbs, Simple (Naïve) Bayes –Learning over text

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 10: Bayesian Classfiers: MDL, BOC, and Gibbs Minimum Description Length (MDL) Revisited –Bayesian Information Criterion (BIC): justification for Occam’s Razor Bayes Optimal Classifier (BOC) –Using BOC as a “gold standard” Gibbs Classifier –Ratio bound Simple (Naïve) Bayes –Rationale for assumption; pitfalls Practical Inference using MDL, BOC, Gibbs, Naïve Bayes –MCMC methods (Gibbs sampling) –Glossary: –To learn more: Next: Sections , Mitchell –More on simple (naïve) Bayes –Application to learning over text

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 11: Simple (Naïve) Bayes and Learning over Text More on Simple Bayes, aka Naïve Bayes –More examples –Classification: choosing between two classes; general case –Robust estimation of probabilities: SQ Learning in Natural Language Processing (NLP) –Learning over text: problem definitions –Statistical Queries (SQ) / Linear Statistical Queries (LSQ) framework Oracle Algorithms: search for h using only (L)SQs –Bayesian approaches to NLP Issues: word sense disambiguation, part-of-speech tagging Applications: spelling; reading/posting news; web search, IR, digital libraries Next: Section 6.11, Mitchell; Pearl and Verma –Read: Charniak tutorial, “Bayesian Networks without Tears” –Skim: Chapter 15, Russell and Norvig; Heckerman slides

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 12: Introduction to Bayesian Networks Graphical Models of Probability –Bayesian networks: introduction Definition and basic principles Conditional independence (causal Markovity) assumptions, tradeoffs –Inference and learning using Bayesian networks Acquiring and applying CPTs Searching the space of trees: max likelihood Examples: Sprinkler, Cancer, Forest-Fire, generic tree learning CPT Learning: Gradient Algorithm Train-BN Structure Learning in Trees: MWST Algorithm Learn-Tree-Structure Reasoning under Uncertainty: Applications and Augmented Models Some Material From: Next: Read Heckerman Tutorial

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Lecture 13: Learning Bayesian Networks from Data Bayesian Networks: Quick Review on Learning, Inference –Learning, eliciting, applying CPTs –In-class exercise: Hugin demo; CPT elicitation, application –Learning BBN structure: constraint-based versus score-based approaches –K2, other scores and search algorithms Causal Modeling and Discovery: Learning Cause from Observations Incomplete Data: Learning and Inference (Expectation-Maximization) Tutorials on Bayesian Networks –Breese and Koller (AAAI ‘97, BBN intro): –Friedman and Goldszmidt (AAAI ‘98, Learning BBNs from Data): –Heckerman (various UAI/IJCAI/ICML , Learning BBNs from Data): Next Week: BBNs Concluded; Post-Midterm (Thu 11 Oct 2001) Review After Midterm: More EM, Clustering, Exploratory Data Analysis

Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Meta-Summary Machine Learning Formalisms –Theory of computation: PAC, mistake bounds –Statistical, probabilistic: PAC, confidence intervals Machine Learning Techniques –Models: version space, decision tree, perceptron, winnow, ANN, BBN –Algorithms: candidate elimination, ID3, backprop, MLE, Naïve Bayes, K2, EM Midterm Study Guide –Know Definitions (terminology) How to solve problems from Homework 1 (problem set) How algorithms in Homework 2 (machine problem) work –Practice Sample exam problems (handout) Example runs of algorithms in Mitchell, lecture notes –Don’t panic!