1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Part 2: Unsupervised Learning
CS188: Computational Models of Human Behavior
Basics of Statistical Estimation
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Pattern Recognition and Machine Learning
Computer vision: models, learning and inference Chapter 8 Regression.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Supervised Learning Recap
What is Statistical Modeling
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Machine Learning CMPT 726 Simon Fraser University
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Crash Course on Machine Learning
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Lecture 2: Statistical learning primer for biologists
Gaussian Processes For Regression, Classification, and Prediction.
Introduction Machine Learning: Chapter 1. Contents Types of learning Applications of machine learning Disciplines related with machine learning Well-posed.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
CS 2750: Machine Learning Density Estimation
Non-Parametric Models
Statistical Models for Automatic Speech Recognition
Special Topics In Scientific Computing
Data Mining Lecture 11.
Distributions and Concepts in Probability Theory
CSCI 5822 Probabilistic Models of Human and Machine Learning
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Robust Full Bayesian Learning for Neural Networks
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Speech recognition, machine learning
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
Speech recognition, machine learning
Presentation transcript:

1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA

Outline 1.What is Machine Learning (ML)? 2. Types of Information Processing Problems Solved Regression Classification Clustering Modeling Uncertainty/Inference 3. New Developments 1.Fully Bayesian Approach 4.Summary 2

3 What is Machine Learning? Programming computers to: –Perform tasks that humans perform well but difficult to specify algorithmically Principled way of building high performance information processing systems –search engines, information retrieval –adaptive user interfaces, personalized assistants (information systems) –scientific application (computational science) –engineering

4 Example Problem: Handwritten Digit Recognition Handcrafted rules will result in large no of rules and exceptions Better to have a machine that learns from a large training set Wide variability of same numeral

5 ML History ML has origins in Computer Science PR has origins in Engineering They are different facets of the same field Methods around for over 50 years Revival of Bayesian methods –due to availability of computational methods

6 Some Successful Applications of Machine Learning Learning to recognize spoken words –Speaker-specific strategies for recognizing primitive sounds (phonemes) and words from speech signal –Neural networks and methods for learning HMMs for customizing to individual speakers, vocabularies and microphone characteristics Table 1.1

7 Some Successful Applications of Machine Learning Learning to drive an autonomous vehicle – Train computer-controlled vehicles to steer correctly – Drive at 70 mph for 90 miles on public highways – Associate steering commands with image sequences

8 Handwriting Recognition Task T – recognizing and classifying handwritten words within images Performance measure P – percent of words correctly classified Training experience E – a database of handwritten words with given classifications

Generalization (Training) The ML Approach Data Collection Samples Model Selection Probability distribution to model process Parameter Estimation Search Find optimal solution to problem Values/distributions Decision (Inference OR Testing) 9

Types of Problems machine learning is used for Classification 2. Regression 3. Clustering (Data Mining) 4. Modeling/Inference

16 Classification Three classes (Stratified, Annular, Homogeneos) Two variables shown 100 points Which class should x belong to?

17 Cell-based Classification Naïve approach of cell based voting will fail – exponential growth of cells with dimensionality – 12 dimensions discretized into 6 gives 3 million cells Hardly any points in each cell

Popular Statistical Models Generative – Naïve Bayes – Mixtures of multinomials – Mixtures of Gaussians – Hidden Markov Models (HMM) – Bayesian networks – Markov random fields 18 Discriminative – Logistic regression – SVMs – Traditional neural networks – Nearest neighbor – Conditional Random Fields (CRF)

Regression Problems Forward problem data set Red curve is result of fitting a two-layer neural network by minimizing sum-of-squared 19 error Corresponding inverse problem by reversing x and t Very poor fit to data: GMMs used here

 k are mixing coefficients that sum to one 21 Clustering Old Faithful (Hydrothermal Geyser in Yellowstone) – 272 observations – Duration (mins, horiz axis) vs Time to next eruption (vertical axis) – Simple Gaussian unable to capture structure – Linear superposition of two Gaussians is better Gaussian has limitations in modeling real data sets Gaussian Mixture Models give very complex densities One –dimension – Three Gaussians in blue – Sum in red

Bayesian Representation of Uncertainty Use of probability to represent uncertainty is not an ad-hoc choice If numerical values represent degrees of belief, – then simple axioms for manipulating degrees of belief leads to sum and product rules of probability 23 Not just frequency of random, repeatable event It is a quantification of uncertainty Example: Whether Arctic ice cap will disappear by end of century – We have some idea of how quickly polar ice is melting – Revise it on the basis of fresh evidence (satellite observations) – Assessment will affect actions we take (to reduce greenhouse gases) Handled by general Bayesian interpretation

Modeling Uncertainty 24 A Causal Bayesian Network Example of Inference : Cancer is independent of Age and Gender given exposure to Toxics and Smoking

25 The Fully Bayesian Approach Bayes Rule Bayesian Probabilities Concept of Conjugacy Monte Carlo Sampling

26 Rules of Probability Given random variables X and Y Sum Rule gives Marginal Probability Product Rule: joint probability in terms of conditional and marginal Combining we get Bayes Rule where Viewed as Posterior  likelihood  prior

27 Probability Distributions: Relationships Discrete- Binary Bernoulli Single binary variable Discrete- Multi-valued Continuous Gaussian Angular Von Mises Binomial N samples of Bernoulli Beta Continuous variable Multinomial One of K values = K-dimensional binary vector Gamma ConjugatePrior of univariate Gaussian precision Infinite mixture of Gaussians Dirichlet K random variables between [0.1] Exponential Special case of Gamma Uniform N=1 Conjugate Prior Conjugate Prior Wishart Conjugate Prior of multivariate Gaussian precision matrix Large N Student’s-t Generalization of Gaussian robust to Outliers between {0,1] K=2 Gaussian-Gamma Conjugate prior of univariate Gaussian Unknown mean and precision Gaussian-Wishart Conjugate prior of multi-variate Gaussian Unknown mean and precision matrix

28 Fully Bayesian Approach Feasible with increased computational power Intractable posterior distribution handled using either –variational Bayes or –stochastic sampling e.g., Markov Chain Monte Carlo, Gibbs

29 Summary ML is a systematic approach instead of ad-hockery in designing information processing systems Useful for solving problems of classification, regression, clustering and modeling The fully Bayesian approach together with Monte Carlo sampling is leading to high performing systems Great practical value in many applications – Computer science search engines text processing document recognition – Engineering