Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Logistic Regression Psy 524 Ainsworth.
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Naïve Bayes Classifier
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
What is Statistical Modeling
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Linear Methods for Classification
Decision Theory Naïve Bayes ROC Curves
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Crash Course on Machine Learning Part II
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Optimal Bayes Classification
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
KNN & Naïve Bayes Hongning Wang
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Chapter 3: Maximum-Likelihood Parameter Estimation
Machine Learning Logistic Regression
CH 5: Multivariate Methods
10701 / Machine Learning.
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Machine Learning Logistic Regression
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Logistic Regression Chapter 7.
Multivariate Methods Berlin Chen, 2005 References:
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Linear Models (II) Rong Jin

Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical learning approaches for classification problems Training ExamplesLearning a Statistical Model  Prediction p(y|x;  ) (1.8m, m) (1.87, m) (1.65, f) (1.66, m) (1.58, f) (1.63, f) p(h|male), p(male) p(f|male), p(female) p(male|1.8) p(female|1.8)

Recap  Generative Model p(y|x): determine the class y for object x  p(y): how frequent class y appears  p(x|y): the input pattern for class y  Example: 1.8m  male? female? p(male|1.8m) = p(male)p(1.8m|male)/p(1.8m) p(female|1.8m) = p(female)p(1.8m|female)/p(1.8m) p(1.8m) = p(1.8m|male)p(male)+p(1.8m|female)p(female)

Recap  Learning p(x|y) and p(y) p(y) = #example(y)/#examples Maximum likelihood estimation for p(x|y)  Example Training examples:  (1.8m, m) (1.87, m) (1.65, f) (1.66, m) (1.58, f) (1.63, f) p(male) = N male /N p(female) = N female /N Assume that the height distributions for male and female are Gaussian  (  male,  male ), (  female,  female ) MLE estimation

Recap

 Naïve Bayes Input x is a vector: x={x 1, x 2,…,x m } Assume each feature is independent from each other given the class y  p(x|y)=p(x 1 |y)p(x 2 |y)…p(x m |y)  each p(x i |y) is estimated using MLE approach

Text Classification (I)  Learning to classify text Input x: document  Represented by a vector of words Output y: interesting or not  +1 for interesting document, -1 for uninteresting  Generative model for text classification (TC) p(+), p(-) p(doc|+), p(doc|-) Naïve Bayes approach

Text Classification (II)  Learning parameters for TC p(+) = n(+)/N, p(-) = n(-)/N  n(  ): number of positive (or negative) documents  N: total number of documents Apply MLE for estimating p(w|+), p(w|-)

Text Classification (IV) Twenty NewsgroupsAn Example

Text Classification (IV)  Any problems with the naïve Bayes text classifier?

Text Classifier (V)  Problems Irrelevant words Unseen words  Solution Select relevant words using mutual information I(x, y)  x: whether or not word x appearing in a document  y: the document is of interests or not Unseen words  Word class approach Introduce word class T= {t 1, t 2, …, t m } Compute p(t i |+), p(t i |-) When w is unseen before, replace p(w|  ) with p(t i |  )  Word correlation approach finding out the correlations between words p(w|w’)  Using web information p(w|  ) =  w’ p(w|w’)p(w’|  )

Logistic Regression Model  Gaussian generative model == find a linear decision boundary.  Why not learn a linear decision boundary directly?

Logistic Regression Model  The log-ratio of positive class to negative class  Results

Logistic Regression Model  Assume the inputs and outputs are related in the log linear function  Estimate weights: MLE approach

Example 1: Heart Disease Input feature x: age group id output y: having heart disease or not +1: having heart disease -1: no heart disease 1: : : : : : : : 60-64

Example 1: Heart Disease Logistic regression model Learning w and c: MLE approach Numerical optimization: w = 0.58, c = -3.34

Example 1: Heart Disease  W = 0.58 An old person is more likely to have heart disease  C = i  w+c < 0  p(+|i) < p(-|i) i  w+c > 0  p(+|i) > p(-|i) i  w+c = 0  decision boundary  i* = 5.78  53 year old

Naïve Bayes Solution Inaccurate fitting Non Gaussian distribution i* = 5.59 Close to the estimation by logistic regression Even though naïve Bayes does not fit input patterns well, it still works fine for the decision boundary

Problems with Using Histogram Data?

Uneven Sampling for Different Ages

Solution w = 0.63, c =  i* = 5.65

Example: Text Classification  Input x: a binary vector Each word is a different dimension x i = 0 if the ith word does not appear in the document x i = 1 if it appears in the document  Output y: interesting document or not +1: interesting -1: uninteresting

Example: Text Classification Doc 1 The purpose of the Lady Bird Johnson Wildflower Center is to educate people around the world, … Doc 2 Rain Bird is one of the leading irrigation manufacturers in the world, providing complete irrigation solutions for people… termtheworldpeoplecompanycenter… Doc … Doc …

Example 2: Text Classification  Logistic regression model Every term t i is assigned with a weight w i  Learning parameters: MLE approach  Need numerical solutions

Example 2: Text Classification  Weight w i w i > 0: term t i is a positive evidence w i < 0: term t i is a negative evidence w i = 0: term t i is irrelevant to whether the document is intesting The larger the | w i |, the more important t i term is determining whether the document is interesting.  Threshold c

Example 2: Text Classification Dataset: Reuter Classification accuracy Naïve Bayes: 77% Logistic regression: 88%

Why Logistic Regression Works better for Text Classification?  Common words Small weights in logistic regression Large weights in naïve Bayes  Weight ~ p(w|+) – p(w|-)  Independence assumption Naive Bayes assumes that each word is generated independently Logistic regression is able to take into account of the correlation of words

Comparison Generative Model Model P(x|y) Model the input patterns Usually fast converge Cheap computation Robust to noise data But Usually performs worse Discriminative Model Model P(y|x) directly Model the decision boundary Usually good performance But Slow convergence Expensive computation Sensitive to noise data